Comment by jacob_cannell on Why do we think most AIs unintentionally created by humans would create a worse world, when the human mind was designed by random mutations and natural selection, and created a better world? · 2017-05-13T21:39:09.636Z · score: 3 (3 votes) · LW · GW

The evolution of the human mind did not create a better world from the perspective of most species of the time - just ask the dodo, most megafauna, countless other species, etc. In fact, the evolution of humanity was/is a mass extinction event.

Comment by jacob_cannell on Don't Fear the Reaper: Refuting Bostrom's Superintelligence Argument · 2017-03-02T01:12:56.998Z · score: 2 (2 votes) · LW · GW

Agreed the quoted "we found" claim overreaches. The paper does have a good point though: the recalcitrance of further improvement can't be modeled as a constant, it necessarily scales with current system capability. Real world exponentials become sigmoids; mold growing in your fridge and a nuclear explosion are both sigmoids that look exponential at first: the difference is a matter of scale.

Really understanding the dynamics of a potential intelligence explosion requires digging deep into the specific details of an AGI design vs the brain in terms of inference/learning capabilities vs compute/energy efficiency, future hardware parameters, etc. Can't show much with vague broad stroke abstractions.

Comment by jacob_cannell on Open thread, Feb. 13 - Feb. 19, 2017 · 2017-02-13T23:49:44.503Z · score: 4 (4 votes) · LW · GW

The levels of misunderstanding in these types of headlines is what is scary. The paper is actually about a single simple model trained for a specific purpose, unrelated to the hundreds of other models various deepmind researchers have trained. But somehow that all too often just get's reduced to "Deepmind's AI", as if it's a monolothic thing. And here it's even worse, where somehow the fictional monolothic AI and Deepmind the company are now confused into one.

Comment by jacob_cannell on Choosing prediction over explanation in psychology: Lessons from machine learning · 2017-01-18T02:43:47.322Z · score: 0 (0 votes) · LW · GW

If you instead claim that the "input" can also include observations about interventions on a variable, t

Yes - general prediction - ie a full generative model - already can encompass causal modelling, avoiding any distinctions between dependent/independent variables: one can learn to predict any variable conditioned on all previous variables.

For example, consider a full generative model of an ATARI game, which includes both the video and control input (from human play say). Learning to predict all future variables from all previous automatically entails learning the conditional effects of actions.

For medicine, the full machine learning approach would entail using all available data (test measurements, diet info, drugs, interventions, whatever, etc) to learn a full generative model, which then can be conditionally sampled on any 'action variables' and integrated to generate recommended high utility interventions.

then your predictions will certainly fail unless the algorithm was trained in a dataset where someone actually intervened on X (i.e. someone did a randomized controlled trial)

In any practical near term system, sure. In theory though, a powerful enough predictor could learn enough of the world physics to invent de novo interventions wholecloth. ex: AlphaGo inventing new moves that weren't in its training set that it essentially invented/learned from internal simulations.

Comment by jacob_cannell on Progress and Prizes in AI Alignment · 2017-01-04T01:53:59.134Z · score: 5 (5 votes) · LW · GW

I came to a similar conclusion a while ago: it is hard to make progress in a complex technical field when progress itself is unmeasurable or worse ill-defined.

Part of the problem may be cultural: most working in the AI safety field have math or philosophy backgrounds. Progress in math and philosophy is intrinsically hard to measure objectively; success is mostly about having great breakthrough proofs/ideas/papers that are widely read and well regarded by peers. If your main objective is to convince the world, then this academic system works fine - ex: Bostrom. If your main objective is to actually build something, a different approach is perhaps warranted.

The engineering oriented branches of Academia (and I include comp sci in this) have a very different reward structure. You can publish to gain social status just as in math/philosophy, but if your idea also has commercial potential there is the powerful additional motivator of huge financial rewards. So naturally there is far more human intellectual capital going into comp sci than math, more into deep learning than AI safety.

In a sane world we'd realize that AI safety is a public good of immense value that probably requires large-scale coordination to steer the tech-economy towards solving. The X-prize approach essentially is to decompose a big long term goal into subgoals which are then contracted to the private sector.

The high level abstract goal for the Ansari XPrize was "to usher in a new era of private space travel". The specific derived prize subgoal was then "to build a reliable, reusable, privately financed, manned spaceship capable of carrying three people to 100 kilometers above the Earth's surface twice within two weeks".

AI safety is a huge bundle of ideas, but perhaps the essence could be distilled down to: "create powerful AI which continues to do good even after it can take over the world."

For the Ansari XPrize, the longer term goal of "space travel" led to the more tractable short term goal of "100 kilometers above the Earth's surface twice within two weeks". Likewise, we can replace "the world" in the AI safety example:

AI Safety "XPrize": create AI which can take over a sufficiently complex video game world but still tends to continue to do good according to a panel of human judges.

To be useful, the video game world should be complex in the right ways: it needs to have rich physics that agents can learn to control, it needs to permit/encourage competitive and cooperative strategic complexity similar to that in the real world, etc. So more complex than pac-man, but simpler than the Matrix. Something in the vein of a minecraft mod might have the right properties - but there are probably even more suitable open-world MMO games.

The other constraint on such a test is we want the AI to be superhuman in the video game world, but not our world (yet). Clearly this is possible - ala AlphaGo. But naturally the more complex the video game world is in the direction of our world, both the harder the goal becomes and the more dangerous.

Note also that the AI should not know that it is being tested; it shall not know it inhabits a simulation. This isn't likely to be any sort of problem for the AI we can actually build and test in the near future, but it becomes an interesting issue later on.

DeepMind is now focusing on Starcraft, OpenAI has universe, so we already on a related path. Competent AI for open-ended 3D worlds with complex physics - like minecraft - is still not quite here, but is probably realizable in just a few years.

Comment by jacob_cannell on [Link] White House announces a series of workshops on AI, expresses interest in safety · 2016-05-06T06:43:19.916Z · score: 0 (0 votes) · LW · GW

A sign!

Comment by jacob_cannell on [Link] White House announces a series of workshops on AI, expresses interest in safety · 2016-05-06T06:42:55.657Z · score: 1 (1 votes) · LW · GW

Other way around. Europe started HBP started first, then US announced the BI. The HBP is centered around Markham's big sim project. The BI is more like a bag of somewhat related grants, focusing more on connectome mapping. From what I remember, both projects are long term, and most of the results are expected to be 5 years out or so, but they are publishing along the way.

Comment by jacob_cannell on What can we learn from Microsoft's Tay, its inflammatory tweets, and its shutdown? · 2016-03-31T04:23:49.245Z · score: 1 (1 votes) · LW · GW

Not much.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-31T04:18:13.095Z · score: 0 (0 votes) · LW · GW

We are in a vast, seemingly-empty universe. Models which predict the universe should be full of life should be penalised with a lower likelihood.

The only models which we can rule out are those which predict the universe is full of life which leads to long lasting civs which expand physically, use lots of energy, and rearrange on stellar scales. That's an enormous number of conjunctions/assumptions about future civs. Models where the universe is full of life, but life leads to tech singularities which end physical expansion (transcension) perfectly predict our observations, as do models where civs die out, as do models where life/civs are rare, and so on. . ..

But this is all a bit off-topic now because we are ignoring the issue I was responding to: the evidence from the timing of the origin of life on earth

If we find that life arose instantly, that is evidence which we can update our models on, and leads to different likelihoods then finding that life took 2 billion years to evolve on earth. The latter indicates that abiogenesis is an extremely rare chemical event that requires a huge amount of random molecular computations. The former indicates - otherwise.

Imagine creating a bunch of huge simulations that generate universes, and exploring the parameter space until you get something that matches earth's history. The time taken for some evolutionary event reveals information about the rarity of that event.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-30T22:02:44.359Z · score: 0 (0 votes) · LW · GW

"Anthropic selection bias" just filters out observations that aren't compatible with our evidence. The idea that "anthropic selection bias" somehow equalizes the probability of any models which explain the evidence is provably wrong. Just wrong. (There are legitimate uses of anthropic selection bias effects, but they come up in exotic scenarios such as simulations.)

If you start from the perspective of an ideal bayesian reasoner - ala Solomonoff, you only consider theories/models that are compatible with your observations anyway.

So there are models where abiogenesis is 'easy' (which is really too vague - so let's define that as a high transition probability per unit time, over a wide range of planetary parameters.)

There are also models where abiogenesis is 'hard' - low probability per unit time, and generally more 'sparse' over the range of planetary parameters.

By Baye's Rule, we have: P(H|E) = P(E|H)P(H) / P(E)

We are comparing two hypothesises, H1, and H2, so we can ignore P(E) - the prior of the evidence, and we have:

P(H1|E) )= P(E|H1) P(H1)

P(H2|E) )= P(E|H2) P(H2)

)= here means 'proportional'

Assume for argument's sake that the model priors are the same. The posterior then just depends on the likelihood - P(E|H1) - the probability of observing the evidence, given that the hypothesis is true.

By definition, the model which predicts abiogenesis is rare has a lower likelihood.

One way of thinking about this: Abiogenesis could be rare or common. There are entire sets of universes where it is rare, and entire sets of universes where it is common. Absent any other specific evidence, it is obviously more likely that we live in a universe where it is more common, as those regions of the multiverse have more total observers like us.

Now it could be that abiogenesis is rare, but reaching that conclusion would require integrating evidence from more than earth - enough to overcome the low initial probability of rarity.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-30T00:02:22.699Z · score: 0 (0 votes) · LW · GW

I assume by 'algea-like', you actually mean cyanobacteria. The problem is that anything that uses photosynthesis creates oxygen, and oxygen eventually depletes the planet's chemical oxygen sinks, which inevitably leads to a Great Oxygenation Event. The latter provides a new powerful source of energy for life, which then leads to something like a cambrian explosion.

The largest uncertainty in these steps is the timeline for oxygenation to deplete the planet's oxygen sinks. This is basically the time it takes cyanobacteria to 'terraform' the planet. It took 200 million years on Earth, but this is presumably dependent on planetary chemical composition and size.

From the known exoplanets, we can already estimate there are on the order a billion-ish earth-size worlds in habitable zones. By the mediocrity principle, it's a priori unlikely that earth's chemistry is 1 in a billion. Especially given that Mar's composition is vaguely similar enough that it was probably an 'almost earth'.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-29T23:51:42.393Z · score: 3 (3 votes) · LW · GW

We keep finding earlier and earlier fossil evidence for life on earth, which has finally shrunk the time window for abiogenesis on earth down to near zero.

The late heavy bombardment sterilized earth repeatedly until about 4.1 billion years ago, and our earliest fossil evidence for life is also now (probably) 4.1 billion years old. Thus life probably either evolved from inorganics near instantly, or more likely, it was already present in the comet/dust cloud from the earth's formation. (panspermia)

With panspermia, abiogenesis may be rare, but the effect is similar to abiogenesis being common.

Comment by jacob_cannell on Resolving the Fermi Paradox: New Directions · 2016-03-19T04:26:27.434Z · score: 0 (0 votes) · LW · GW

I don't see why the usual infrared argument doesn't apply to them or KIC 8462852.

If by infrared argument, you refer to the idea that a dyson swarm should radiate in the infrared, this is probably wrong. This relies on the assumption that the alien civ operates at earth temp of 300K or so. As you reduce that temp down to 3K, the excess radiation diminishes to something indistinguishable to the CMB, so we can't detect large cold structures that way. For the reasons discussed earlier, non-zero operating temp would only be useful during initial construction phases, whereas near-zero temp is preferred in the long term. The fact that KIC 8462852 has no infrared excess makes it more interesting, not less.

Comment by jacob_cannell on Resolving the Fermi Paradox: New Directions · 2016-03-18T20:11:19.129Z · score: 0 (0 votes) · LW · GW

A Dyson sphere helps with moving matter around, potentially with elemental conversion, and with cooling.

Moving matter - sure. But that would be a temporary use case, after which you'd no longer need that config, and you'd want to rearrange it back into a bunch of spherical dense computing planetoids.

potentially with elemental conversion

This is dubious. I mean in theory you could reflect/recapture star energy to increase temperature to potentially generate metals faster, but it seems to be a huge waste of mass for a small increase in cooking rate. You'd be giving up all of your higher intelligence by not using that mass for small compact cold compute centers.

If nothing else, if the ambient energy of the star is a big problem, you can use it to redirect the energy elsewhere away from your cold brains.

Yes, but that's just equivalent to shielding. That only requires redirecting the tiny volume of energy hitting the planetary surfaces. It doesn't require any large structures.

Exponential growth.

Exponential growth = transcend. Exponential growth will end unless you can overcome the speed of light, which requires exotic options like new universe creation or altering physics.

I think Sandberg's calculated you can build a Dyson sphere in a century, apropos of KIC 8462852's oddly gradual dimming. And you hardly need to finish it before you get any benefits.

Got a link? I found this FAQ, where he says:

Using self-replicating machinery the asteroid belt and minor moons could be converted into habitats in a few years, while disassembly of larger planets would take 10-1000 times longer (depending on how much energy and violence was used).

That's a lognormal dist over several decades to several millenia. A dimming time for KIC 8462852 in the range of centuries to a millenia is a near perfect (lognormal) dist overlap.

So it may be worth while investing some energy in collecting small useful stuff (asteroids) into larger, denser computational bodies. It may even be worth while moving stuff farther from the star, but the specifics really depend on a complex set of unknowns.

You say 'may', but that seems really likely.

The recent advances in metamaterial shielding stuff suggest that low temps could be reached even on earth without expensive cooling, so the case I made for moving stuff away from the star for cooling is diminished.

Collecting/rearranging asteroids, and rearranging rare elements of course still remain as viable use cases, but they do not require as much energy, and those energy demands are transient.

After all, what 'complex set of unknowns' will be so fine-tuned that the answer will, for all civilizations, be 0 rather than some astronomically large number?

Physics. It's the same for all civilizations, and their tech paths are all the same. Our uncertainty over those tech paths does not translate into a diversity in actual tech paths.

You cannot show that this resolves the Fermi paradox unless you make a solid case that cold brains will find harnessing solar systems' energy and matter totally useless!

There is no 'paradox'. Just a large high-D space of possibilities, and observation updates that constrain that space.

I never ever claimed that cold brains will "find harnessing solar systems' energy and matter totally useless", but I think you know that. The key question is what are their best uses for the energy/mass of a system, and what configs maximize those use cases.

I showed that reversible computing implies extremely low energy/mass ratios for optimal compute configs. This suggests that advanced civs in the timeframe 100 to 1000 years ahead of us will be mass-limited (specifically rare metal element limited) rather than energy limited, and would rather convert excess energy into mass rather than the converse.

Which gets me back to a major point: endgames. For reasons I outlined earlier, I think the transcend scenarios more likely. They have a higher initial prior, and are far more compatible with our current observations.

In the transcend scenarios, exponential growth just continues up until some point in the near future where exotic space-time manipulations - creating new universes or whatever - are the only remaining options for continued exponential growth. This leads to an exit for the civ, where from the outside perspective it either physically dies, disappears, or transitions to some final inert config. Some of those outcomes would be observable, some not. Mapping out all of those outcomes in detail and updating on our observations would be exhausting - a fun exercise for another day.

The key variable here is the timeframe from our level to the final end-state. That timeframe determines the entire utility/futility tradeoff for exploitation of matter in the system, based on ROI curves.

For example, why didn't we start converting all of the useful matter of earth into babbage-style mechanical computers in the 19th century? Why didn't we start converting all of the matter into vaccuum tube computers in the 50's? And so on....

In an exponentially growing civ like ours, you always have limited resources, and investing those resources in replicating your current designs (building more citizens/compute/machines whatever) always has complex opportunity cost tradeoffs. You also are expending resources advancing your tech - the designs themselves - and as such you never expend all of your resources on replicating current designs, partly because they are constantly being replaced, and partly because of the opportunity costs between advancing tech/knowledge vs expanding physical infrastructure.

So civs tend to expand physically at some rate over time. The key question is how long? If transcension typically follows 1,000 years after our current tech level, then you don't get much interstellar colonization bar a few probes, but you possibly get temporary dyson swarms. If it only takes 100 years, then civs are unlikely to even leave their home planet.

You only get colonization outcomes if transcension takes long enough, leading to colonization of nearby matter, which all then transcend roughly within the timeframe of their distance from the origin. Most of the nearby useful matter appears to be rogue planets, so colonization of stellar systems would take even longer, depending on how far down it is in the value chain.

And even in the non-transcend models (say the time to transcend is greater than millions of years), you can still get scenarios where the visible stars are not colonized much - if their value is really low, compared to abundant higher value cold dark matter (rogue planets, etc), colonization is slow/expensive, and the timescale spread over civ ages is low.

Comment by jacob_cannell on Resolving the Fermi Paradox: New Directions · 2016-03-17T17:44:54.084Z · score: 0 (0 votes) · LW · GW

So your entire argument boils down to another person who thinks transcension is universally convergent and this is the solution to the Fermi paradox?

No . .. As I said above, even if transcension is possible, that doesn't preclude some expansion. You'd only get zero expansion if transcension is really easy/fast. On the convergence issue, we should expect that the main development outcomes are completely convergent. Transcension is instrumentally convergent - it helps any realistic goals.

I don't see what your reversible computing detour adds to the discussion, if you can't show that making only a few cold brains sans any sort of cosmic engineering is universally convergent.

The reversible computing stuff is important for modeling the structure of advanced civs. Even in transcension models, you need enormous computation - and everything you could do with new universe creation is entirely compute limited. Understanding the limits of computing is important for predicting what end-tech computation looks like for both transcend and expand models. (for example if end-tech optimal were energy limited, this predicts dyson spheres to harvest solar energy)

The temperatures implied by 10,000x energy density on earth preclude all life or any interesting computation.

I never said anything about using biology or leaving the Earth intact. I said quite the opposite.

Advanced computation doesn't happen at those temperatures, for the same basic reasons that advanced communication doesn't work for extremely large values of noise in SNR. I was trying to illustrate the connection between energy flow and temperature.

You need to show your work here. Why is it unlikely? Why don't they disassemble solar systems to build ever more cold brains? I keep asking this, and you keep avoiding it.

First let us consider the optimal compute configuration of a solar system without any large-scale re-positioning, and then we'll remove that constraint.

For any solid body (planet,moon,asteroid,etc), there is some optimal compute design given it's structural composition, internal temp, and incoming irradiance from the sun. Advanced compute tech doesn't require any significant energy - so being closer to the sun is not an advantage at all. You need to expend more energy on cooling (for example, it takes about 15 kilowatts to cool a single current chip from earth temp to low temps, although there have been some recent breakthroughs in passive metamaterial shielding that could change that picture). So you just use/waste that extra energy cooling the best you can.

So, now consider moving the matter around. What would be the point of building a dyson sphere? You don't need more energy. You need more metal mass, lower temperatures and smaller size. A dyson sphere doesn't help with any of that.

Basically we can rule out config changes for the metal/rocky mass (useful for compute) that: 1.) increase temperature 2.) increase size

The gradient of improvement is all in the opposite direction: decreasing temperature and size (with tradeoffs of course).

So it may be worth while investing some energy in collecting small useful stuff (asteroids) into larger, denser computational bodies. It may even be worth while moving stuff farther from the star, but the specifics really depend on a complex set of unknowns.

One of the big unknowns of course being the timescale, which depends on the transcend issue.

Now for the star itself, it has most of the mass, but that mass is not really accessible, and most of it is in low value elements - we want more metals. It could be that the best use of that matter is to simply continue cooking it in the stellar furnace to produce more metals - as there is no other way, as far as i know.

But doing anything with the star would probably take a very long amount of time, so it's only relevant in non-transcendent models.

In terms of predicted observations, in most of these models there are few if any large structures, but individual planetary bodies will probably be altered from their natural distributions. Some possible observables: lower than expected temperatures, unusual chemical distributions, and possibly higher than expected quantities/volumes of ejected bodies.

Some caveats: I don't really have much of an idea of the energy costs of new universe creation, which is important for the transcend case. That probably is not a reversible op, and so it may be a motivation for harvesting solar energy.

There's also KIC 8462852 of course. If we assume that it is a dyson swarm like object, we can estimate a rough model for civs in the galaxy. KIC 8462852 has been dimming for at least a century. It could represent the endphase of a tech civ, approaching it's final transcend state. Say that takes around 1,000 years (vaguely estimating from the 100 years of data we have).

This dimming star is one out of perhaps 10 million nearby stars we have observed in this way. Say 1 in 10 systems will ever develop life, the timescale spread or deviation is about a billion years - then we should expect to observe about 1 in 10 million endphase dimming stars, given that phase lasts only 1,000 years. This would of course predict a large number of endstate stars, but given that we just barely detected KIC 8462852 because it was dimming, we probably can't yet detect stars that already dimmed and then stabilized long ago.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-17T05:26:39.162Z · score: 0 (0 votes) · LW · GW

The low temperature low energy devices would be more akin to crazy deep extremophile lithotrophic bacteria or deep sea fish on Earth, living slow metabolisms and at low densities and matter/energy fluxes,

Hmm I think you misunderstood my model. At the limits of computation, you approach the maximal computational density - the maximum computational capacity per unit mass - only at zero temperature. The stuff you are talking about - anything that operates at any non-zero temp - has infinitely less compute capability than the zero-temp stuff.

So your model and analogy is off - the low temp devices are like gods - incomprehensibly faster and more powerful, and bio life and warm tech is like plants, bacteria, or perhaps rocks - not even comparable, not even in the same basic category of 'thing'.

In any situation other than perfect coordination, that which replicates itself more rapidly becomes more common.

Of course. But it depends on what the best way to replicate is. If new universe creation is feasible (and it appears to be, from what we know of physics), then civs advance rather quickly to post-singularity godhood and start creating new universes. Among other things, this allows exponential growth/replication which is vastly superior to puny polynomial growth you can get by physical interstellar colonization. (it also probably allows for true immortality, and perhaps actual magic - altering physics) And even if that tech is hard/expensive, colonization does not entail anything big, hot, or dumb. Realistic colonization would simply result in many small, compact, cold civ objects. Also see the other thread.

Comment by jacob_cannell on Resolving the Fermi Paradox: New Directions · 2016-03-17T05:02:20.752Z · score: 0 (0 votes) · LW · GW

I understand your points about why colder is better, my question is: why don't they expand constantly with ever more cold brains, which are collectively capable of ever more computation?

At any point in development, investing resources in physical expansion has a payoff/cost/risk profile, as does investing resources in tech advancement. Spatial expansion offers polynomial growth, which is pretty puny compared to the exponential growth from tech advancement. Furthermore, the distances between stars are pretty vast.

If you plot our current trajectory forward, we get to a computational singularity long long before any serious colonization effort. Space colonization is kind of comical in it's economic payoff compared to chasing Moore's Law. So everything depends on what the endpoint of the tech singularity is. Does it actually end with some hard limit to tech? - If it does, and slow polynomial growth is the only option after that, then you get galactic colonization as the likely outcome. If the tech singularity leads to stronger outcomes ala new universe manipulations, then you never need to colonize, it's best to just invest everything locally. And of course there is the spectrum in between, where you get some colonization, but the timescale is slowed.

Correct me if I'm wrong, but zero energy consumption assumes both coldness and slowness, doesn't it?

No, not for reversible computing. The energy required to represent/compute a 1 bit state transition depends on reliability, temperature, and speed, but that energy is not consumed unless there is an erasure. (and as energy is always conserved, erasure really just means you lost track of a bit)

In fact the reversible superconducting designs are some of the fastest feasible in the near term.

That would be great. If we had 10,000x more energy (and advanced technology etc), we could disassemble the Earth, move the parts around, and come up with useful structures to compute with it which would dissipate that energy productively.

Biological computing (cells) doesn't work at those temperatures, and all the exotic tech far past bio computers requires even lower temperatures. The temperatures implied by 10,000x energy density on earth preclude all life or any interesting computation.

Yes, it is expensive. Good thing we have a star right there to move all that mass with. Maybe its energy could be harnessed with some sort of enclosure....

I'm not all that confident that moving mass out system is actually better than just leaving it in place and doing best effort cooling in situ. The point is that energy is not the constraint for advancing computing tech, it's more mass limited than anything, or perhaps knowledge is the most important limit. You'd never want to waste all that mass on a dyson sphere. All of the big designs are dumb - you want it to be as small, compact, and cold as possible. More like a black hole.

Which ends in everything being used up, which even if all that planet engineering and moving doesn't require Dyson spheres, is still inconsistent with our many observations of exoplanets and

It's extremely unlikely that all the matter gets used up in any realistic development model, even with colonization. Life did not 'use up' more than a tiny fraction of the matter of earth, and so on.

leaves the Fermi paradox unresolved.

From the evidence for mediocrity, the lower KC complexity of mediocrity, and the huge number of planets in the galaxy, I start with a prior strongly favoring reasonably high number of civs/galaxy, and low odds on us being first.

We have high uncertainty on the end/late outcome of a post-singularity tech civ (or at least I do, I get the impression that people here inexplicably have extremely high confidence in the stellavore expansionist model, perhaps because of lack of familiarity with the alternatives? not sure).

If post-singularity tech allows new universe creation and other exotic options, you never have much colonization - at least not in this galaxy, from our perspective. If it does not, and there is an eventual end of tech progression, then colonization is expected.

But as I argued above, even colonization could be hard to detect - as advanced civs will be small/cold/dark.

Transcension is strongly favored a priori for anthropic reasons - transcendent universes create far more observers like us. Then, updating on what we can see of the galaxy, colonization loses steam: our temporal rank is normal, whereas most colonization models predict we should be early .

For transcension, naturally its hard to predict what that means .. . but one possibility is a local 'exit' at least from the perspective of outside observers. Creation of lots of new universes, followed by physical civ-death in this universe, but effective immortality in new universes (ala game theoretic horse trading across the multiverse). New universe creation could also potentially alter physics in ways that permit further tech progression. Either way, all of the mass is locally invested/used up for 'magic' that is incomprehensibly more valuable than colonization.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-16T18:06:28.583Z · score: 0 (0 votes) · LW · GW

Given that physics is the same across space, the math/physics/tech of different civs will end up being the same, more or less. I wouldn't call that coordination.

To extend your analogy, plants don't grow in the center of the earth - and this has nothing to do with coordination. Likewise, no human tribes colonized the ocean depths, and this has nothing to do with coordination.

Comment by jacob_cannell on Resolving the Fermi Paradox: New Directions · 2016-03-16T17:54:37.696Z · score: 1 (1 votes) · LW · GW

Computing near the Sun costs more because it's hotter, sure. Fortunately, I understand that the Sun produces hundreds, even thousands of times more energy than a little fusion reactor does, so some inefficiencies are not a problem.

Every practical computational tech substrate has some error bounded compute/temperature curve, where computational capability quickly falls to zero past some upper bound temperature. Even for our current tech, computational capacity essentially falls off a cliff somewhere well below 1,000K.

My general point is that the really advanced computing tech shifts all those curves over - towards lower temperatures. This is a hard limit of physics, it can not be overcome. So for a really advanced reversible quantum computer that employs superconduction and long coherence quantum entanglement, 1K is just as impossible as 1,000K. It's not entirely a matter of efficiency.

Another way of looking at it - advanced tech just requires lower temperatures - as temperature is just a measure of entropy (undesired/unmodeled state transitions). Temperature is literally an inverse measure of computational potential. The ultimate computer necessarily must have a temperature of zero.

You say that the reversible brains don't need that much energy.

At the limits they need zero. Approaching anything close to those limits they have no need of stars. Not only that, but they couldn't survive any energy influx much larger than some limit, and that limit necessarily must go to zero as their computational capacity approaches theoretical limits.

If it's energy, then they will want to pipe in as much energy as possible from their local star.

No. There is an exact correct amount of energy to pipe in based on their viable operating temperature of their current tech civ. And this amount goes to zero as you advance up the tech.

It may help to consider applying your statement to our current planet civ. What if we could pipe in 10000x more energy than we currently receive from the sun. Wouldn't that be great? No. It would cook the earth.

The same principle applies, but as you advance up the ultra-tech ladder, the temp ranges get lower and lower (because remember, temp is literally an inverse measure of maximum computational capabillity).

OK, but more computing power is always better, the cold brains want as much as possible, so what limits them?

Given some lump of matter, there is of course a maximum information storage capacity and a max compute rate - in a reversible computer the compute rate is bounded by the maximum energy density the system can structurally support which is just bounded by its mass. In terms of ultimate limits, it really depends on whether exotic options like creating new universes are practical or not. If creating new universes is feasible, there probably are no hard limits, all limits becomes soft.

So you should get a universe of Dyson spheres feeding out mass-energy to the surrounding cold brains who are constantly colonizing fresh systems for more mass-energy to compute in the voids with

Dyson spheres are extremely unlikely to be economically viable/useful, given the low value of energy past a certain tech level (vastly lower energy need per unit mass).

Cold brains need some mass, the question then is how the colonization value of mass varies across space. Mass that is too close to a star would need to be moved away from the star, which is very expensive.

So the most valuable mass that gets colonized first would be the rogue planets/nomads - which apparently are more common than attached planets.

If colonization continues long enough, it will spread to lower and lower valued real estate. So eventually smaller rocky bodies in the outer system get stripped away, slowly progressing inward.

The big unknown variable is again what the end of tech in the universe looks like, which gets back to that new universe creation question. If that kind of ultimate/magic tech is possible, civs will invest everything in to that, and you have less colonization, depending on the difficulty/engineering tradeoffs.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-13T18:10:16.997Z · score: 2 (2 votes) · LW · GW

Depends on what you mean by 'intelligence'.

If you mean tech/culture/language capable, well it isn't surprising that has only happened once, because it is so recent, and the first tech species tends to takeover the planet and preclude others.

If you mean something more like "near human problem solving capability", then that has evolved robustly in multiple separate vertebrate lineages: - corvids, primates, cetaceans, proboscids. It also evolved in an invertebrate lineage (octopi) with a very different brain plan. I think that qualifies as extremely robust, and it suggests that evolution of culturual intelligence is probably inevitable, given enough time/energy/etc.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-13T18:04:48.684Z · score: 1 (1 votes) · LW · GW

evidence of a Great Filter in our past.

Most of the space of possible great filters in the past have been ruled out. Rare planets is out. Tectonics is out. Rare bio origins is out. The mediocrity of earth's temporal rank rules out past disaster scenarios, ala Bostrom/Tegmark's article.

and the fact we don't see aliens is evidence of a Great Filter in the future.

Mediocrity of temporal rank rules out any great filter in the future that has anything to do with other civs, because in scenarios where that is the filter, surviving observers necessarily find themselves on early planets.

Furthermore, natural disasters are already ruled out as a past filter, and thus as a future filter as well.

So all that remains is this narrow space of possibilities that relate to the timescale of evolution, where earth is rare in that evolution runs unusually fast here. Given that there are many billions of planets in the galaxy in habitable zones, earth has to be 10^10 rare or so, which seems pretty unlikely at this point.

Also, 'seeing aliens' depends on our model of what aliens should look like - which really is just our model for the future of post-biological civs. Our observations currently can only rule out the stellavore expansionist model. The transcend model predicts small, cold, compact civs that would be very difficult to detect directly.

That being said, if aliens exist, the evidence may already be here, we just haven't interpreted it correctly.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-12T22:40:20.276Z · score: 1 (1 votes) · LW · GW

So the fact that intelligence took this long to evolve - 4-5 billions of years after biogenesis, and 600-700 million years after the first multicellular animals - must be important.

~5 billion years out of an expected ~10 billion year lifespan for a star like the sun - mediocrity all the way down!

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-12T22:34:03.974Z · score: 3 (3 votes) · LW · GW

The high value matter/energy or real estate is probably a tiny portion of the total, and is probably far from stars, as stellar environments are too noisy/hot for advanced computation.

Can you expand on this?

See this post.

Extrapolating from current physics to ultimate computational intelligences, the most important constraint is temperature/noise, not energy. A hypothetical optimal SI would consume almost no energy, and it's computational capability would be inversely proportional to it's temperature. So at the limits you have something very small, dense, cold, and dark, approaching a black hole.

Passive shielding appears to be feasible, but said feasibility decreases non-linearly with proximity to stars.

So think of the computational potential of space-time as a function of position in the galaxy. The computational potential varies inversely with temperature. The potential near a star is abysmal. The most valuable real estate is far out in the interstellar medium, potentially on rogue planets or even smaller cold bodies, where passive shielding can help reduce temperatures down to very low levels.

So to an advanced civ, the matter in our solar system is perhaps worthless - the energy cost of pulling the matter far enough away from the star and cooling it is greater than it's computational value.

All computation requires matter/energy.

Computation requires matter to store/represent information, but doesn't require consumption of that matter. Likewise computation also requires energy, but does not require consumption of that energy.

At the limits you have a hypothetical perfect reversible quantum computer, which never erases any bits. Instead, unwanted bits are recycled internally and used for RNG. This requires a perfect balance of erasure with random bit consumption, but that seems possible in theory for general approximate inference algorithms of the types SI is likely to be based on.

that the stars were huge piles of valuable materials that had inconveniently caught fire and needed to be put out.

This is probably incorrect. From the perspective of advanced civs, the stars are huge piles of worthless trash. They are the history of life rather than it's future, the oceans from which advanced post-bio civs emerge.

Comment by jacob_cannell on AlphaGo versus Lee Sedol · 2016-03-12T18:34:02.939Z · score: 0 (0 votes) · LW · GW

We have wildly different definitions of interesting, at least in the context of my original statement. :)

Comment by jacob_cannell on AlphaGo versus Lee Sedol · 2016-03-12T09:02:01.739Z · score: -4 (4 votes) · LW · GW

If you can prove anything interesting about a system, that system is too simple to be interesting. Logic can't handle uncertainty, and doesn't scale at all to describing/modelling systems as complex as societies, brains, AIs, etc.

Comment by jacob_cannell on AlphaGo versus Lee Sedol · 2016-03-12T08:55:57.023Z · score: 0 (0 votes) · LW · GW

Briefly skimming Christiano's post, this is actually one of the few/first proposals from someone MIRI related that actually seems to be on the right track (and similar to my own loose plans). Basically it just boils down to learning human utility functions with layers of meta-learning, with generalized RL and IRL.

Comment by jacob_cannell on AlphaGo versus Lee Sedol · 2016-03-12T08:50:24.413Z · score: 0 (0 votes) · LW · GW

When I started hearing about the latest wave of results from neural networks, I thought to myself that Eliezer was probably wrong to bet against them. Should MIRI rethink its approach to friendliness?

Yes.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-12T05:17:05.545Z · score: 0 (0 votes) · LW · GW

If planets like Earth were very rare in ways that didn't change much with time you'd still see a time that was typical

The time measurement is not the only rank measurement we have. We also can compare the sun vs other stars, and it is mediocre across measurements.

Rarity requires an (intrinsically unlikely, ala solomonoff) mechanism - something unusual that happened at some point in the developmental process, and most such mechanisms would entangle with multiple measurements.

At this point in time we can pretty much rule out all mechanisms operating at the stellar scale, it would have to be something far more local.

Tectonics as rare has been disproven recently. Europa was recently shown to have active tectonics, possibly pluto, and probably mars at least at some point.

For later evolutionary development stuff, it will be awhile before we have any data for rank measurements. But given how every other measurement so far has come up as mediocre . . ..

We can learn alot actually from exploring europa, mars, and other spots that could/should have some evidence for at least simple life. That can help fit at least a simple low complexity model for typical planetary development.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-12T05:01:06.549Z · score: 0 (0 votes) · LW · GW

It's not binary of course, there's a feasibility spectrum that varies with speed. On the low end there is a natural speed for slow colonization which requires very little energy/effort, which is colonization roughly at the speed of star orbits around the galaxy. That would take hundreds of millions of years, but it could use gravitational assists and we already have the tech. Indeed, biology itself could perhaps manage slow colonization.

Given that the galaxy is already 54 galactic-years old, if life is actually as plentiful as mediocrity suggests, then the 'too hard' explanation can't contain much probability mass - as the early civs should have arose quite some time ago.

I find it more likely that the elder civs already have explored, and that the galaxy is already 'colonized'. It is unlikely that advanced civs are stellavores. The high value matter/energy or real estate is probably a tiny portion of the total, and is probably far from stars, as stellar environments are too noisy/hot for advanced computation. We have little hope of finding them until after our own maturation to some post-singularity state.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-11T19:08:32.436Z · score: 0 (0 votes) · LW · GW

I take it as strong evidence for Rare earth.

It's the exact opposite.

If the earth was rare, this rarity would show up in the earth's rank along many measurement dimensions. Rarity requires selection pressure - a filter - which alters the distribution. We don't see that at all. Instead we see no filtering, no unusual rank in the dimensions we can measure. The exact opposite is far more likely true - the earth is common.

For instance, say that the earth was rare in orbiting a rare type of star. Then we would see that the sun would have unusual rank along many dimensions. Instead it is normal/typical - in brightness, age, type, planets, etc.

Comment by jacob_cannell on Astrobiology, Astronomy, and the Fermi Paradox II: Space & Time Revisited · 2016-03-11T05:42:58.406Z · score: 3 (3 votes) · LW · GW

I take this as another sign favoring transcension over expansion, and also weird-universes.

The standard dev model is expansion - habitable planets lead to life leads to intelligence leads to tech civs which then expand outward.

If the standard model was correct, barring any wierd late filter, then the first civ to form in each galaxy would colonize the rest and thus preclude other civs from forming.

Given that the strong mediocrity principle holds - habitable planets are the norm, life is probably the norm, enormous expected number of bio worlds, etc, if the standard model is correct than most observers will find themselves on an unusually early planet - because the elder civs prevent late civs from forming.

But that isn't the case, so that model is wrong. In general it looks like a filter is hard to support, given how strongly all the evidence has lined up for mediocrity, and the inherent complexity penalty.

Transcension remains as a viable alternative. Instead of expanding outward, each civ progresses to a tech singularity and implodes inward, perhaps by creating new baby universes, and perhaps using that to alter the distribution over the multiverse, and thus gaining the ability to effectively alter physics (as current models of baby universe creation suggest the parent universe has some programming level control over the physics of the seed). This would allow exponential growth to continue, which is enormously better than expansion which only provides polynomial growth. So everyone does this if it's possible. Furthermore, if it's possible anywhere in the multiverse, then those pockets expand faster, and thus they was and will dominate everywhere. So if that's true the multiverse has/will be edited/restructured/shaped by (tiny, compressed, cold, invisible) gods.

Barring transcension wierdness, another possibility is that the multiverse is somehow anthropic tuned for about 1 civ per galaxy, and galaxy size is cotuned for this, as it provides a nice sized niche for evolution, similar to the effect of continents/island distributions on the earth scale. Of course, this still requires a filter, which has a high complexity penalty.

Comment by jacob_cannell on AIFoom Debate - conclusion? · 2016-03-11T05:19:20.967Z · score: 0 (0 votes) · LW · GW

I don't understand your statement.

I didn't say anything in my post above about the per neuron state - because it's not important. Each neuron is a low precision analog accumulator, roughly up to 8-10 bits ish, and there are 20 billion neurons in the cortex. There are another 80 billion in the cerebellum, but they are unimportant.

The memory cost of storing the state for an equivalent ANN is far less than than 20 billion bytes or so, because of compression - most of that state is just zero most of the time.

In terms of computation per neuron per cycle, when a neuron fires it does #fanout computations. Counting from the total synapse numbers is easier than estimating neurons * avg fanout, but gives the same results.

When a neuron doesn't fire .. .it doesn't compute anything of significance. This is true in the brain and in all spiking ANNs, as it's equivalent to sparse matrix operations - where the computational cost depends on the number of nonzeros, not the raw size.

Comment by jacob_cannell on AIFoom Debate - conclusion? · 2016-03-10T23:04:37.202Z · score: 2 (2 votes) · LW · GW

The human brain has about 100bn neurons and operates at 100Hz. The NVIDIA Tesla K80 has 8.73TFLOPS single-precision performance with 24GB of memory. That's 1.92bits per neuron and 0.87 floating point operations per neuron-cycle. Sorry, no matter how you slice it, neurons are complex things that interact in complex ways. There is just no possible way to do a full simulation with ~2 bits per neuron and ~1 flop per neuron-cycle

You are assuming enormously suboptimal/naive simulation. Sure if you use a stupid simulation algorithm, the brain seems powerful.

As a sanity check, apply your same simulation algorithm to simulating the GPU itself.

It has 8 billion transistors that cycle at 1 ghz, with a typical fanout of 2 to 4. So that's more than 10^19 gate ops/second! Far more than the brain . ..

The brain has about 100 trillion synapses, and the average spike rate is around 0.25hz (yes, really). So that's only about 25 trillion synaptic events/second. Furthermore, the vast majority of those synapses are tiny and activate on an incoming spike with low probability around 25% to 30% or so (stochastic connection dropout). The average synapse has an SNR equivalent of 4 bits or less. All of these numbers are well-supported from the neuroscience lit.

Thus the brain as a circuit computes with < 10 trillion low bit ops/second. That's nothing, even if it's off by 10x.

Also, synapse memory isn't so much an issue for ANNs, as weights are easily compressed 1000x or more by various schemes, from simple weight sharing to more complex techniques such as tensorization.

As we now approach moore's law, our low level circuit efficiency has already caught up to the brain, or is it close. The remaining gap is almost entirely algorithmic level efficiency.

Comment by jacob_cannell on AIFoom Debate - conclusion? · 2016-03-09T04:26:20.166Z · score: 0 (0 votes) · LW · GW

While that particular discussion is quite interesting, it's irrelevant to my point above - which is simply that once you achieve parity, it's trivially easy to get at least weak superhuman performance through speed.

Comment by jacob_cannell on AIFoom Debate - conclusion? · 2016-03-07T21:15:53.801Z · score: 2 (2 votes) · LW · GW

The tl;dr is what I wrote: learning cycles would be hours or days, and a foom would require hundreds or thousands of learning cycles at minimum.

Much depends on what you mean by "learning cycle" - do you mean a complete training iteration (essentially a lifetime) of an AGI? Grown from seed to adult?

I'm not sure where you got the 'hundreds to thousands' of learning cycles from either. If you want to estimate the full experimental iteration cycle count, it would probably be better to estimate from smaller domains. Like take vision - how many full experimental cycles did it take to get to current roughly human-level DL vision?

It's hard to say exactly, but it is roughly on the order of 'not many' - we achieved human-level vision with DL very soon after the hardware capability arrived.

If we look in the brain, we see that vision is at least 10% of the total computational cost of the entire brain, and the brain uses the same learning mechanisms and circuit patterns to solve vision as it uses to solve essentially everything else.

Likewise, we see that once we (roughly kindof) solved vision in the very general way the brain does, we see that same general techniques essentially work for all other domains.

There is just no plausible way for an intelligence to magic itself to super intelligence in less than large human timescales.

Oh thats easy - as soon as you get one adult, human level AGI running compactly on a single GPU, you can then trivially run it 100x faster on a supercomputer, and or replicate it 1 million fold or more. That generation of AGI then quickly produces the next, and then singularity.

It's slow going until we get up to that key threshold of brain compute parity, but once you pass that we probably go through a phase transition in history.

Comment by jacob_cannell on AIFoom Debate - conclusion? · 2016-03-07T21:04:20.104Z · score: 3 (3 votes) · LW · GW

AlphaGo represents a general approach to AI, but its instantiation on the specific problem of Go tightly constrains the problem domain and solution space ..

Sure, but that wasn't my point. I was addressing key questions of training data size, sample efficiency, and learning speed. At least for Go, vision, and related domains, the sample efficiency of DL based systems appears to be approaching that of humans. The net learning efficiency of the brain is far beyond current DL systems in terms of learning per joule, but the gap in terms of learning per dollar is less, and closing quickly. Machine DL systems also easily and typically run 10x or more faster than the brain, and thus learn/train 10x faster.

Comment by jacob_cannell on Updating towards the simulation hypothesis because you think about AI · 2016-03-07T20:49:26.755Z · score: 0 (0 votes) · LW · GW

Consider three types of universes ...

Your are privileging your hypothesis - there are vastly more types of universes ...

There are universes where life develops and civilizations are abundant, and all of our observations to date are compatible with the universe being filled with advanced civs (which probably become mostly invisible to us given current tech as they approach optimal physical configurations of near zero temperature and tiny size).

The are universes like the above where advanced civs spawn new universes to gain god-like 'magic' anthropic powers, effectively manipulating/rewriting the laws of physics.

Universes in these categories are both more aggressive/capable replicators - they create new universes at a higher rate, so they tend to dominate any anthropic distribution.

And finally, there are considerations where the distribution over simulation observer moments diverges significantly from original observer moments, which tends to complicate these anthropic considerations.

For example, we could live in a universe with lots of civs, but they tend to focus far more simulations on the origins of the first civ or early civs.

Comment by jacob_cannell on Request for help with economic analysis related to AI forecasting · 2016-02-07T01:50:52.189Z · score: 0 (0 votes) · LW · GW

This isn't a new phenomena: the word 'computer' originally described a human occupation, perhaps the first major 'mental' occupation to be completely automated.

That started around WW2, so this general trend can be seen for the last 75 years or so. I'd look at the fraction of the economy going into computing, as how that has changed over time is due to the interplay between the various effects of automation.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-31T00:06:02.792Z · score: 0 (2 votes) · LW · GW

Raw and "peak performance" FLOPS numbers should be taken with a grain of salt.

Yeah, but in this case the best convolution and gemm codes can reach like 98% efficiency for the simple standard algorithms and dense input - which is what most ANNs use for about everything.

given that a TitanX apparently draws as much as 240W of power at full load, your "petaflop-scale supercomputer" will cost you a few hundred-thousand dollars and draw 42kW to do what the brain does within 20W or so

Well, in this case of Go and for an increasing number of domains, it can do far more than any brain - learns far faster. Also, the current implementations are very very far from optimal form. There is at least another 100x to 1000x easy perf improvement in the years ahead. So what 100 gpus can do now will be accomplished by a single GPU in just a year or two.

It's just a circuit, and it obeys the same physical laws.

Of course. Neuroglia are not magic or "woo". They're physical things, much like silicon chips and neurons.

Right, and they use a small fraction of the energy budget, and thus can't contribute much to the computational power.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-30T18:34:46.682Z · score: 1 (1 votes) · LW · GW

For the SL phase, they trained 340 million updates with a batch size of 16, so 5.4 billion position-updates. However the database had only 29 million unique positions. That's about 200 gradient iterations per unique position.

The self-play RL phase for AlphaGo consisted of 10,000 minibatches of 128 games each, so about 1 million games total. They only trained that part for a day.

They spent more time training the value network: 50 million minibatches of 32 board positions, so about 1.6 billion positions. That's still much smaller than the SL training phase.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-30T18:26:30.907Z · score: 2 (2 votes) · LW · GW

so a 10 year pro may be familiar with say 100,000 games.

That's 27.4 games a day, on average. I think this is an overestimate.

It was my upper bound estimate, and if anything it was too low.

A pro will grow up in a dedicated go school where there are hundreds of other players just playing go and studying go all day. Some students will be playing speed games, and some will be flipping through summaries of historical games in books/magazines and or on the web.

When not playing, people will tend to walk around and spectate the other games (nowdays this is also trivial to do online). An experienced player can reconstruct some of the move history by just glancing at the board.

So if anything, 27.4 games watched/skimmed/experienced per day is too low for the upper estimate.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-30T18:18:53.344Z · score: 3 (5 votes) · LW · GW

Both deep networks and the human brain require lots of data, but the kind of data they require is not the same. Humans engage mostly in semi-supervised learning, where supervised data comprises a small fraction of the total.

This is probably a misconception for several reasons. Firstly, given that we don't fully understand the learning mechanisms in the brain yet, it's unlikely that it's mostly one thing. Secondly, we have some pretty good evidence for reinforcement learning in the cortex, hippocampus, and basal ganglia. We have evidence for internally supervised learning in the cerebellum, and unsupervised learning in the cortex.

The point being: these labels aren't all that useful. Efficient learning is multi-objective and doesn't cleanly divide into these narrow categories.

The best current guess for questions like this is almost always to guess that the brain's solution is highly efficient, given it's constraints.

In the situation where a go player experiences/watches a game between two other players far above one's own current skill, the optimal learning update is probably going to be a SL style update. Even if you can't understand the reasons behind the moves yet, it's best to compress them into the cortex for later. If you can do a local search to understand why the move is good, then that is even better and it becomes more like RL, but again, these hard divisions are arbitrary and limiting.

A few hundred TitanX's can muster up perhaps a petaflop of compute.

Could you elaborate? I think this number is too high by roughly one order of magnitude.

The GTX TitanX has a peak perf of 6.1 terraflops, so you'd need only a few hundred to get a petaflop supercomputer (more specifically, around 175).

The high end estimate of the brain is 10 petaflops (100 trillion synapses * 100 hz max firing rate).

Estimating the computational capability of the human brain is very difficult. Among other things, we don't know what the neuroglia cells may be up to, and these are just as numerous as neurons.

It's just a circuit, and it obeys the same physical laws. We have this urge to mystify it for various reasons. Neuroglia can not possibly contribute more to the total compute power than the neurons, based on simple physics/energy arguments. It's another stupid red herring like quantum woo.

These estimates are only validated when you can use them to make predictions. And if you have the right estimates (brain equivalent to 100 terraflops ish, give or take an order of magnitude), you can roughly predict the outcome of many comparisons between brain circuits vs equivalent ANN circuits (more accurately than using the wrong estimates).

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-30T00:22:49.673Z · score: 10 (10 votes) · LW · GW

This is a big deal, and it is another sign that AGI is near.

Intelligence boils down to inference. Go is an interesting case because good play for both humans and bots like AlphaGo requires two specialized types of inference operating over very different timescales:

  • rapid combinatoric inference over move sequences during a game(planning). AlphaGo uses MCT search for this, whereas the human brain uses a complex network of modules involving the basal ganglia, hippocampus, and PFC.
  • slow deep inference over a huge amount of experience to develop strong pattern recognition and intuitions (deep learning). AlphaGo uses deep supervised and reinforcement learning via SGD over a CNN for this. The human brain uses the cortex.

Machines have been strong in planning/search style inference for a while. It is only recently that the slower learning component (2nd order inference over circuit/program structure) is starting to approach and surpass human level.

Critics like to point out that DL requires tons of data, but so does the human brain. A more accurate comparison requires quantifying the dataset human pro go players train on.

A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.

So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .

AlphaGo was trained on the KGS dataset: 160,00 games and 29 million positions. So it did not train on significantly more data than a human pro. The data quantities are actually very similar.

Furthermore, the human's dataset is perhaps of better quality for a pro, as they will be familiar with mainly pro level games, whereas the AlphaGo dataset is mostly amateur level.

The main difference is speed. The human brain's 'clockrate' or equivalent is about 100 hz, whereas AlphaGo's various CNNs can run at roughly 1000hz during training on a single machine, and perhaps 10,000 hz equivalent distributed across hundreds of machines. 40,000 hours - a lifetime of experience - can be compressed 100x or more into just a couple of weeks for a machine. This is the key lesson here.

The classification CNN trained on KGS was run for 340 million steps, which is about 10 iterations per unique position in the database.

The ANNs that AlphaGo uses are much much smaller than a human brain, but the brain has to do a huge number of other tasks, and also has to solve complex vision and motor problems just to play the game. AlphaGO's ANNs get to focus purely on Go.

A few hundred TitanX's can muster up perhaps a petaflop of compute. The high end estimate of the brain is 10 petaflops (100 trillion synapses 100 hz max firing rate). The more realistic estimate is 100 teraflops (100 trillion synapes 1 hz avg firing rate), and the lower end is 1/10 that or less.

So why is this a big deal? Because it suggests that training a DL AI to master more economically key tasks, such as becoming an expert level programmer, could be much closer than people think.

The techniques used here are nowhere near their optimal form yet in terms of efficiency. When Deep Blue beat Kasparov in 1996, it required a specialized supercomputer and a huge team. 10 years later chess bots written by individual programmers running on modest PC's soared past Deep Blue - thanks to more efficient algorithms and implementations.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-29T23:53:04.277Z · score: 2 (2 votes) · LW · GW

Humans also learn extensively by studying the games of experts. In Japan/China, even fans follow games from newspapers.

A game might take an hour on average. So a pro with 10 years of experience may have played/watched upwards of 10,000 games. However, it takes much less time to read a game that has already been played - so a 10 year pro may be familiar with say 100,000 games. Considering that each game has 200+ moves, that roughly is a training set of order 2 to 20 million positions.

AlphaGo's training set consisted of 160,000 games with 29 million positions, so the upper end estimate for humans is similar. More importantly, the human training set is far more carefully curated and thus of higher quality.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-29T23:40:51.658Z · score: 4 (4 votes) · LW · GW

If you were trying to predict the future of flight in 1900, you'd do pretty terrible by surveying experts. You would do far better by taking a Kurzweil style approach where you put combustion engine performance on a chart and compared it to estimates of the power/weight ratios required for flight.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-29T23:37:55.915Z · score: 0 (0 votes) · LW · GW

It's actually much worse than that, because huge breakthroughs themselves are what create new experts. So on the eve of huge breakthroughs, currently recognized experts invariably predict the future is far, simply because they can't see the novel path towards the solution.

In this sense everyone who is currently an AI expert is, trivially, someone who has failed to create AGI. The only experts who have any sort of clear understanding of how far AGI is are either not currently recognized or do not yet exist.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-29T21:58:41.340Z · score: 1 (1 votes) · LW · GW

Specifically I meant approx bayesian inference over the tensor program space to learn the ANN, not that the ANN itself needs to implement bayesian inference (although they will naturally tend to learn that, as we see in all the evidence for various bayesian ops in the brain) .

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-29T21:54:09.725Z · score: 3 (3 votes) · LW · GW

And in a laptop the same circuitry that it is used to run a spreadsheet is used to play a video game.

Exactly, and this a good analogy to illustrate my point. Discovering that the cortical circuitry is universal vs task-specific (like an ASIC) was a key discovery.

Human-level (perhaps weakly superhuman) vision is achieved only in very specific tasks where large supervised datasets are available.

Note I didn't say that we have solved vision to superhuman level, but this is simply not true. Current SOTA nets can achieve human-level performance in at least some domains using modest amounts of unsupervised data combined with small amounts of supervised data.

Human vision builds on enormous amounts of unsupervised data - much larger than ImageNet. Learning in the brain is complex and multi-objective, but perhaps best described as self-supervised (unsupervised meta-learning of sub-objective functions which then can be used for supervised learning).

A five year old will have experienced perhaps 50 million seconds worth of video data. Imagenet consists of 1 million images, which is vaguely equivalent to 1 million seconds of video if we include 30x amplification for small translations/rotations.

The brain's vision system is about 100x larger than current 'large' vision ANNs. But If deepmind decided to spend the cash on that and make it a huge one off research priority, do you really doubt that they could build a superhuman general vision system that learns with a similar dataset and training duration?

So are things like AIXI-tl, Hutter-search, Gödel machine, and so on. Yet I would not consider any of them as the "foundational aspect" of intelligence.

The foundation of intelligence is just inference - simply because universal inference is sufficient to solve any other problem. AIXI is already simple, but you can make it even simpler by replacing the planning component with inference over high EV actions, or even just inference over program space to learn approx planning.

So it all boils down to efficient inference. The new exciting progress in DL - for me at least - is in understanding how successful empirical optimization techniques can be derived as approx inference update schemes with various types of priors. This is what I referred to as new and upcoming "Bayesian methods" - bayesian grounded DL.

Comment by jacob_cannell on Open thread, Jan. 25 - Jan. 31, 2016 · 2016-01-29T17:56:14.851Z · score: 2 (2 votes) · LW · GW

I think the Hansonian EM scenario is probably closer to the truth than the others, but it focuses perhaps too much on generalists. The DL explosion will also result in vastly powerful specialists that are still general enough to do complex human jobs, but still are limited or savant like in other respects. Yes, there's a huge market for generalists, but that isn't the only niche.

Take this Go AI for example - critics like to point out that it can't drive a car, but why would you want it to? Car driving is a different niche, which will be handled by networks specifically trained for that niche to superhuman level. A generalist AGI could 'employ' these various specialists as needed, perhaps on fast timescales.

Specialization in human knowledge has increased over time, AI will accelerate that trend.

Comment by jacob_cannell on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning · 2016-01-29T17:48:26.364Z · score: 4 (4 votes) · LW · GW

Humans need extensive training to become competent, as will AGI, and this should have been obvious for anyone with a good understanding of ML.

[Link]: KIC 8462852, aka WTF star, "the most mysterious star in our galaxy", ETI candidate, etc.

2015-10-20T01:10:30.548Z · score: 5 (6 votes)

The Unfriendly Superintelligence next door

2015-07-02T18:46:22.116Z · score: 51 (53 votes)

Analogical Reasoning and Creativity

2015-07-01T20:38:38.658Z · score: 25 (26 votes)

The Brain as a Universal Learning Machine

2015-06-24T21:45:33.189Z · score: 96 (93 votes)

[Link] Word-vector based DL system achieves human parity in verbal IQ tests

2015-06-13T23:38:54.543Z · score: 8 (9 votes)

Resolving the Fermi Paradox: New Directions

2015-04-18T06:00:33.871Z · score: 12 (19 votes)

Transhumanist Nationalism and AI Politics

2015-04-11T18:39:42.133Z · score: 0 (9 votes)

Resurrection through simulation: questions of feasibility, desirability and some implications

2012-05-24T07:22:20.480Z · score: 9 (16 votes)

The Generalized Anti-Pascal Principle: Utility Convergence of Infinitesimal Probabilities

2011-12-18T23:47:31.817Z · score: -4 (13 votes)

Feasibility of Creating Non-Human or Non-Sentient Machine Intelligence

2011-12-10T03:49:27.656Z · score: -4 (5 votes)

Subjective Relativity, Time Dilation and Divergence

2011-02-11T07:50:44.489Z · score: 16 (38 votes)

Fast Minds and Slow Computers

2011-02-05T10:05:33.734Z · score: 26 (43 votes)

Rational Health Optimization

2010-09-18T19:47:02.687Z · score: 20 (41 votes)

Anthropomorphic AI and Sandboxed Virtual Universes

2010-09-03T19:02:03.574Z · score: 2 (43 votes)

Dreams of AIXI

2010-08-30T22:15:04.520Z · score: -1 (28 votes)