Posts

Comments

Comment by MrFailSauce (patrick-cruce) on Unchangeable Code possible ? · 2022-04-16T18:09:57.694Z · LW · GW

A traditional Turing machine doesn't make a distinction between program and data.  The distinction between program and data is really a hardware efficiency optimization that came from the Harvard architecture.   Since many systems are Turing complete, creating an immutable program seems impossible to me.  

For example a system capable of speech could exploit the Turing completeness of formal grammars to execute de novo subroutines.  

A second example.  Hackers were able to exploit the surprising Turing completeness of an image compression standard to embed a virtual machine in a gif.

https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

Comment by MrFailSauce (patrick-cruce) on Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon · 2022-04-16T14:52:58.739Z · LW · GW

I feel like an important lesson to learn from analogy to air conditioners is that some technologies are bounded by physics and cannot improve quickly.(or at all).   I doubt anyone has the data, but I would be surprised if average air conditioning efficiency in BTUs per Watt plotted over the 20th century is not a sigmoid.

Comment by MrFailSauce (patrick-cruce) on Ukraine Post #5: Bits of Information · 2022-03-22T01:51:45.417Z · LW · GW

For seeing through the fog of war, I'm reminded of the German Tank Problem.  

https://en.wikipedia.org/wiki/German_tank_problem

Statistical estimates were ~50x more accurate than intelligence estimates in the cannonical example.  When you include the strong and reasonable incentives for all participants to propagandize, it is nearly impossible to get accurate information about an ongoing conflict.

I think as rationalists, if we're going to see more clearly than conventional wisdom, we need to find sources of information that have more fundamental basis.  I don't yet know what those would be.

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-13T23:41:01.723Z · LW · GW

In reality, an AI can use algorithms that find a pretty good solution most of the time. 

If you replace "AI" with "ML" I agree with this point. And yep this is what we can do with the networks we're scaling.  But "pretty good most of the time" doesn't get you an x-risk intelligence.  It gets you some really cool tools.

If the 3 sat algorithm is O(n^4) then this algorithm might not be that useful compared to other approaches. 

If 3 SAT is O(n^4) then P=NP and back to Aaronson's point; the fundamental structure of reality is much different than we think it is. (did you mean "4^N"?  Plenty of common algorithms are quartic.)  

For the many problems that are "illegible" or "hard for humans to think about" or "confusing", we are nowhere near the bound, so the AI has room to beat the pants off us with the same data. 

The assertion that "illegible" means "requiring more intelligence" rather than "ill-posed" or "underspecified" doesn't seem obvious to me.  Maybe you can expand on this? 

 

Could a superintelligence figure out relativity based on the experiences of the typical caveman?..These clues weren't enough to lead Einstein to relativity, but Einstein was only human. 

I'm not sure I can draw the inference that this means it was possible to generate the theory without the key observations it is explaining.  What I'm grasping at is how we can bound what cababilities more intelligence gives an agent.  It seems intuitive to me that there must be limits and we can look to physics and math to try to understand them.  Which leads us here:

Meaningless. Asymptotic runtime complexity is a mathematical tool that assumes an infinite sequence of ever harder problems.

I disagree.  We've got a highly speculative question in front of us.  "What can a machine intelligence greater than ours accomplish"?  We can't really know what it would be like to be twice as smart any more than an ape can. But if we stipulate that the machine is running on Turing Complete hardware and accept NP hardness then we can at least put an upper bound on the capabilities of this machine. 

Concretely, I can box the machine using a post-quantum cryptographic standard and know that it lacks the computational resources to break out before the heat death of the universe. More abstractly, any AI risk scenario cannot require solving NP problems of more than modest size.  (because of completeness, this means many problems and many of the oft-posed risk scenarios are off the table)

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-13T19:29:49.554Z · LW · GW

I think less than human intelligence is sufficient for an x-risk because that is probably what is sufficient for a takeoff.

If less than human intelligence is sufficient, wouldn't humans have already done it? (or are you saying we're doing it right now?)

How intelligent does an agent need to be to send a HTTP request to the URL /ldap://myfirstrootkit.comon a few million domains?)

A human could do this or write a bot to do this.(and they've tried)  But they'd also be detected, as would an AI.  I don't see this as an x-risk, so much as a manageable problem.

(GPT-3 needed like 1k discrete GPUs to train. Nvidia alone ships something on the order of >73,000k discrete GPUs... per year. How fast exactly do you think returns diminish

I suspect they'll diminish exponentially, because threat requires solving problems of exponential hardness.  To me "1% of annual Nvidia GPUs", or "0.1% annual GPU production" sounds like we're at roughly N-3 of problem size we could solve by using 100% of annual GPU production.  

how confident are you that there are precisely zero capability spikes anywhere in the human and superhuman regimes?

I'm not confident in that.  

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-13T18:41:07.852Z · LW · GW

I spent some time reading the Grinnblatt paper.  Thanks again for the link.  I stand corrected on IQ being uncorrelated with stock prediction.  One part did catch my eye.

Our findings relate to three strands of the literature. First, the IQ and trading behavior analysis builds on mounting evidence that individual investors exhibit wealth-reducing behavioral biases. Research, exemplified by Barber and Odean (2000, 2001, 2002), Grinblatt and Keloharju (2001), Rashes (2001), Campbell (2006), and Calvet, Campbell, and Sodini (2007, 2009a, 2009b), shows that these investors grossly under-diversify, trade too much, enter wrong ticker symbols, are subject to the disposition effect, and buy index funds with exorbitant expense ratios. Behavioral biases like these may partly explain why so many individual investors lose when trading in the stock market (as suggested in Odean (1999), Barber, Lee, Liu, and Odean (2009); and, for Finland, Grinblatt and Keloharju (2000)). IQ is a fundamen- tal attribute that seems likely to correlate with wealth- inhibiting behaviors.

I went to some of references, this one seemed a particularly cogent summary. 

https://faculty.haas.berkeley.edu/odean/papers%20current%20versions/behavior%20of%20individual%20investors.pdf

The take home seems to be that high-IQ investors exceed the performance of low-IQ investors, but institutional investors exceed the performance of individual investors.  Maybe it is just insitutions selecting the smartest, but another coherent view is that the joint intelligence of the group("institution") exceeds the intelligence of high-IQ individuals.  We might need more data to figure it out.  

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-13T17:50:30.426Z · LW · GW

We don't know that, P vs NP is an unproved conjecture. Most real world problems are not giant knapsack problems. And there are algorithms that quickly produce answers that are close to optimal. Actually, most of the real use of intelligence is not a complexity theory problem at all. "Is inventing transistors a O(n) or an O(2^n) problem?"

 

P vs. NP is unproven. But I disagree that "most real world problems are not giant knapsack problems". The Cook-Levin theorem showed that many of the most interesting problems are reducible to NP-complete problems.  I'm going to quote this paper by Scott-Aaronson, but it is a great read and I hope you check out the whole thing.  https://www.scottaaronson.com/papers/npcomplete.pdf

Even many computer scientists do not seem to appreciate how different the world would be if we could solve NP-complete problems efficiently. I have heard it said, with a straight face, that a proof of P = NP would be important because it would let airlines schedule their flights better, or shipping companies pack more boxes in their trucks! One person who did understand was G ̈odel. In his celebrated 1956 letter to von Neumann (see [69]), in which he first raised the P versus NP question, G ̈odel says that a linear or quadratic-time procedure for what we now call NP-complete problems would have “consequences of the greatest magnitude.” For such an procedure “would clearly indicate that, despite the unsolvability of the Entscheidungsproblem, the mental effort of the mathematician in the case of yes-or-no questions could be completely replaced by machines.”

But it would indicate even more. If such a procedure existed, then we could quickly find the smallest Boolean circuits that output (say) a table of historical stock market data, or the human genome, or the complete works of Shakespeare. It seems entirely conceivable that, by analyzing these circuits, we could make an easy fortune on Wall Street, or retrace evolution, or even generate Shakespeare’s 38th play. For broadly speaking, that which we can compress we can understand, and that which we can understand we can predict. Indeed, in a recent book [12], Eric Baum argues that much of what we call ‘insight’ or ‘intelligence’ simply means finding succinct representations for our sense data. On his view, the human mind is largely a bundle of hacks and heuristics for this succinct-representation problem, cobbled together over a billion years of evolution. So if we could solve the general case—if knowing something was tantamount to knowing the shortest efficient description of it—then we would be almost like gods. The NP Hardness Assumption is the belief that such power will be forever beyond our reach.

I take the NP-hardness assumption as foundational.  That being the case, a lot of talk of AI x-risk sounds to me like saying that AI will be an NP oracle.  (For example, the idea that a highly intelligent expert system designing tractors could somehow "get out of the box" and threaten humanity, would require a highly accurate predictive model that would almost certainly contain one or many NP-complete subproblems)

 But current weather prediction doesn't use that data. It just uses the weather satellite data, because it isn't smart enough to make sense of all the social media data. I mean you could argue that most of the good data is from the weather satellite. That social media data doesn't help much even if you are smart enough to use it. If that is true, that would be a way that the weather problem differs from many other problems.

Yes I would argue that current weather prediction doesn't use social media data because cameras at optical wavelengths cannot sound the upper atmosphere.  Physics means there is no free lunch from social media data. 

I would argue that most real world problems are observationally and experimentally bound.  The seminal paper on photoelectric effect was a direct consequence of a series of experimental results from the 19th century.  Relativity is the same story.  It isn't like there were measurements of the speed of light or the ratio of frequency to energy of photons available in the 17th century just waiting for someone with sufficient intelligence to find them in the 17th century equivalent of social media.  And no amount of data on european peasants (the 17th century equivalent of facebook) would be a sufficient substitute.  The right data makes all the difference.  

A common AI risk problem like manipulating a programmer into escalating AI privileges is a less legible problem than examples from physics but I have no reason to think that it won't also be observationally bound.  Making an attempt to manipulate the programmer and being so accurate in the prediction that the AI is highly confident it won't be caught would require a model of the programmer as detailed(or more) than an atmospheric model.  There is no guarantee the programmer has any psychological vulnerabiliies.  There is no guarantee that they share the right information on social media.  Even if they're a prolific poster, why would we think this information is sufficient to manipulate them? 

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-13T17:01:42.373Z · LW · GW

AlphaGo went from mediocre, to going toe-to-toe with the top human Go players in a very short span of time. And now AlphaGo Zero has beaten AlphaGo 100-0. AlphaFold has arguably made a similar logistic jump in protein folding

Do you know how many additional resources this required? 

 

Cost of compute has been decreasing at exponential rate for decades, this has meant entire classes of algorithms which straightforward scale with compute also have become exponentially more capable, and this has already had profound impact on our world. At the very least, you need to show why doubling compute speed, or paying for 10x more GPUs, does not lead to a doubling of capability for the kinds of AI we care about.

Maybe this is the core assumption that differentiates our views.  I think that the "exponential growth" in compute is largely the result of being on the steepest sloped point of a sigmoid rather than on a true exponential.  For example Dennard scaling ceased around 2007 and Moore's law has been slowing over the last decade.  I'm willing to conceed that if compute grows exponentially indefinitely then AI risk is plausible, but I don't see any way that will happen. 

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-13T16:48:14.230Z · LW · GW

Since you bring up selection bias, Grinblatt et al 2012 studies the entire Finnish population with a population registry approach and finds that.

Thanks for the citation.  That is the kind of information I was hoping for.   Do you think that slightly better than human intelligence is sufficient to present an x-risk, or do you think it needs some sort of takeoff or acceleraton to present an x-risk?

So?

I think I can probably explain the "so" in my response to Donald below.

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T17:34:00.545Z · LW · GW

Overshooting by 10x (or 1,000x or 1,000,000x) before hitting 1.5x is probably easier than it looks for someone who does not have background in AI.

Do you have any examples of 10x or 1000x overshoot?  Or maybe a reference on the subject?

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T17:14:22.440Z · LW · GW

Hmmmmm there is a lot here let me see if I can narrow down on some key points.  

Once you have the right algorithm, it really is as simple as increasing some parameter or neuron count.

There are some problems that do not scale well(or at all).  For example, doubling the computational power applied to solving the knapsack problem will let you solve a problem size that is one element bigger.   Why should we presume that intelligence scales like an O(n) problem and not an O(2^n) problem? 

What is happening here? Are both people just looking at a picture and guessing numbers, or can the IQ 150 person program a simulation while the IQ 100 person is looking at the Navier stokes equation trying (and failing) to figure out what it means. 

I picked weather prediction as an exemplar problem because

(1) NWP programs are the product of not just single scientists but the efforts of thousands.  (the intelligences of many people combined into a product that is far greater than any could produce individually)

(II) The problem is fairly well understood and observation limited.  If we could simultaneously measure the properties of 1m^3 voxels of atmsophere our NWP would be dramatically improved.  But our capabilities are closer to one per day(non-simultaneous) measurements of spots rougly 40 km in diameter.  Access to the internet will not improve this.  The measurements don't exist.  Other chaotic systems like ensembles of humans or stocks may very well have this property. 

Smart humans to lots better than dumb ones. Small differences in intelligence make the difference between a loglinear sort and a quadratic one.

But hardly any programs are the result of individual efforts.  They're the product of thousands.  If a quadratic sort slips through it gets caught by a profiler and someone else fixes it.  (And everyone uses compilers, interpreters, libraries, etc...)

A lot of science seems to be done by a handful of top scientists. The likes of Einstein wouldn't be a thing if a higher intelligence didn't make you much better at discovering some things. 

This is not my view of science.  I tend to understand someone like Einstein as an inspirational story we tell when the history of physics in the early 20th century is fact a tale of dozens, if not hundreds.  But I do think this is getting towards the crux.

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T16:34:05.968Z · LW · GW

Are we equivocating on 'much better' here?

Not equivocating but if intelligence is hard to scale and slightly better is not a threat, then there is no reason to be concerned about AI risk.  (maybe a 1% x-risk suggested by OP is in fact a 1e-9 x-risk)

there are considerable individual differences in weather forecasting performances (it's one of the more common topics to study in the forecasting literature),

I'd be interested in seeing any papers on individual differences in weather forecasting performance (even if IQ is not mentioned).  My understanding was that it has been all NWP for the last half-century or so.  

IQ shows up all the time in other forecasting topics as a major predictor

I'd be curious to see this too.  My undering was that,  for example,  stock prediction was not only uncorrelated with IQ,  but that above average performance was primarily selection bias. (ie above average forecasters for a given time period tend to regress toward the mean over subsequent time periods)

Comment by MrFailSauce (patrick-cruce) on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T17:44:28.499Z · LW · GW

I think I'm convinced that we can have human capable AI(or greater) in the next century(or sooner).  I'm unconvinced on a few aspects of AI alignment.  Maybe you could help clarify your thinking.

(1) I don't see how an human capable or a bit smarter than human capable AI(say 50% smarter) will be a serious threat.  Broadly humans are smart because of group and social behavior.  So a 1.5 Human AI might be roughly as smart as two humans?  Doesn't seem too concerning.

(2) I don't see how a bit smarter than humans scales to superhuman levels of intelligence.   

(a) Most things have diminishing returns, and I don't see how one of the most elusive faculties would be an exception. (GPT-3 is a bit better than GPT-2 but GPT-3 requires 10x the resources).  

(b) Requiring 2x resources for +1 intelligence is consistent with my understanding that most of the toughest problems are NP or harder.

(3) I don't see how superhuman intelligence by itself is necessarily a threat.  Most problems cannot be solved by pure reasoning alone.  For example, a human with 150 IQ isn't going to be much better at predicting the weather than a person with 100 IQ.  Weather prediction(and most future prediction) is both knowledge(measurement) and intelligence(modeling) intensive.  Moreover it is chaotic.   A superhuman intelligence might just be able to tell us that it is 70.234% likely to rain tomorrow whereas a merely human intelligence will only be able to tell us it is 70% likely.  I feel like most people concerned about AI alignment assume that more intelligence would make it possible to drive the probability of rain to 99% or 1% 

Comment by MrFailSauce (patrick-cruce) on Exorcizing the Speed Prior? · 2018-07-24T15:20:15.603Z · LW · GW

Current AI does stochastic search, but it is still search. Essentially PP complexity class, instead of NP/P. (with a fair amount of domain specific heuristics)

Comment by MrFailSauce (patrick-cruce) on Probabilistic decision-making as an anxiety-reduction technique · 2018-07-16T15:34:00.561Z · LW · GW

Never leave the house without your d20 :-P

But I agree with you. This seems a simple way to do something like satisficing. Avoiding the great computational cost of an optimal decision.

In terms of prior art that is probably the field you want to explore: https://en.m.wikipedia.org/wiki/Satisficing

Comment by MrFailSauce (patrick-cruce) on Compact vs. Wide Models · 2018-07-16T14:34:55.114Z · LW · GW

Not sure if this is helpful, but since you analogized to chip design. In chip design, you typically verify using a constrained random method when the state space grows too large to verify every input exhaustively. That is, you construct a distribution over the set of plausible strings and then sample it and feed it to your design. Then you compare the result to a model in a higher level language.

Of course, standard techniques like designing for modularity can make the state space more manageable too.

Comment by MrFailSauce (patrick-cruce) on The Craft And The Codex · 2018-07-09T14:36:51.101Z · LW · GW

First off, Scott’s blog is awesome.

Second, the example of dieting comes to mind when I think of training rationality. While they’re not much connected to the rationality community, they are a large group of people focused on overcoming one particular aspect of our irrationallity. (but without much success)

Comment by MrFailSauce (patrick-cruce) on The Fermi Paradox: What did Sandberg, Drexler and Ord Really Dissolve? · 2018-07-09T14:00:24.518Z · LW · GW

What basis is there to assume that the distribution of these variables is log uniform? Why, in the toy example, limit the variables to the interval [0,0.2]? Why not [0,1]?

These choices drive the result.

The problem is, for many of the probabilities, we don’t even know enough about them to say what distribution they might take. You can’t infer a meaningful distribution over variables where your sample size is 1 or 0

Comment by MrFailSauce (patrick-cruce) on Why it took so long to do the Fermi calculation right? · 2018-07-05T14:43:06.467Z · LW · GW

I’m still not seeing a big innovation here. I’m pretty sure most researchers who look at the Drake equation think “huge sensitivity to parameterization.”

If we have a 5 parameter drake equation then number of civilizations scales with X^5, so if X comes in at 0.01, we’ve got a 1e-10 probability of detectable civilization formation. But if we’ve got a 10 parameter Drake equation and X comes in at 0.01 then it implies a 1e-20 probability. (extraordinary smaller)

So yes, it has a a huge sensitivity, but it is primarily a constructed sensitivity. All the Drake equation really tells us is that we don’t know very much and it probably won’t be useful until we can get N above one for more of the parameters.

Comment by MrFailSauce (patrick-cruce) on Dissolving the Fermi Paradox, and what reflection it provides · 2018-06-30T19:36:23.841Z · LW · GW

I’m not sure I understand why they’re against point estimates. As long as the points match the mean of our estimates for the variables, then the points multiplied should match the expected value of the distribution.

Comment by MrFailSauce (patrick-cruce) on The Power of Letting Go Part I: Examples · 2018-06-29T13:40:55.204Z · LW · GW

I think this is an interesting concept and want to see where you go with it. But just devil’s advocating, there are some pretty strong counterexamples for micromanagement. For example, many imperative languages can be ridiculously inefficient. And try solving an NP complete problem with a genetic algorithm and you’ll just get stuck in a local minimum.

Simplicity and emergence are often surprisingly effective but they’re just tools in a large toolbox.

Comment by patrick-cruce on [deleted post] 2018-06-28T15:00:52.350Z

Somewhat ironic that LW is badly in need of better captcha.

Comment by MrFailSauce (patrick-cruce) on Loss aversion is not what you think it is · 2018-06-24T05:30:56.438Z · LW · GW

I read him, he is just incorrect. “People hate losses more than they hate gains” is not explained by DMU. They dislike losses to an extent far greater than predicted by DMU, and more importantly, this dislike is largely scale invariant.

If you go read papers like the original K&T, you’ll see that their data set is just a bunch of statements that are predicted to be equally preferrable under DMU (because marginal utility doesn’t change much for small changes in wealth). What changes the preference is simply whether K&T phrase the question in terms of a loss or a gain.

So...unsurprisingly, Kahneman is accurately describing the theory that won him the Nobel prize.

Comment by MrFailSauce (patrick-cruce) on Order from Randomness: Ordering the Universe of Random Numbers · 2018-06-22T05:23:18.582Z · LW · GW

The result you got is pretty close to the fft of f(t) = t

Which is roughly what you got from sorting noise.

Comment by MrFailSauce (patrick-cruce) on Physics has laws, the Universe might not · 2018-06-21T16:28:54.325Z · LW · GW

All finite length sequences exist in any infinite random sequence. So, in the same way that all the works of shakespeare exist inside an infinite random sequence, so too does a complete representation of any finite universe.

I suppose one could argue by the anthropic principle that we happen to exist in a well ordered finite subsequence of an infinite random sequence. But it is sort of like multiverse theories where it lacks the explanatory power or verifiability of simpler theories.

Comment by MrFailSauce (patrick-cruce) on Order from Randomness: Ordering the Universe of Random Numbers · 2018-06-21T14:32:47.585Z · LW · GW

Maybe I’m being dense, and missing the mystery, but I think this reference might be helpful.

https://www.dsprelated.com/showarticle/40.php

Comment by MrFailSauce (patrick-cruce) on Loss aversion is not what you think it is · 2018-06-21T13:46:44.995Z · LW · GW

I mean...he quotes Kahneman; claiming the guy doesn’t know the implications of his own theory.

Losses hurt more than gains even at scales where DMU predicts that they should not. (because your DMU curve is approximately flat for small losses and gains) Loss aversion is the psychological result which explains this effect.

This is the author’s conclusion: “So, please, don’t go around claiming that behavioral economists are incorporating some brilliant newfound insight that people hate losses more than they like gains. We’ve known about this in price theory since Alfred Marshall’s 1890 Principles of Economics.”

Sorry nope. Alfred Marhall’s Principles would have made the wrong prediction.

Comment by MrFailSauce (patrick-cruce) on Loss aversion is not what you think it is · 2018-06-21T02:21:21.586Z · LW · GW

That makes a lot of sense to me. Aversion to small losses makes a ton of sense as a blanket rule, when the gamble is: lose: don’t eat today win: eat double today don’t play: eat today

Our ancestors probably faced this gamble since long before humans were even humans. Under those stable conditions, a heuristic accounting for scale would have been needlessly expensive.

Comment by MrFailSauce (patrick-cruce) on Loss aversion is not what you think it is · 2018-06-20T21:08:00.901Z · LW · GW

In short, the author is wrong. Diminishing marginal utility only really applies when the stakes are on the order of the agent’s total wealth, whereas the loss aversion asymmetry holds true for relatively small sums.