Fractional progress estimates for AI timelines and implied resource requirements 2021-07-15T18:43:10.163Z
Rogue AGI Embodies Valuable Intellectual Property 2021-06-03T20:37:30.805Z
New article on in vitro iterated embryo selection 2013-08-08T19:28:16.758Z
Why do theists, undergrads, and Less Wrongers favor one-boxing on Newcomb? 2013-06-19T01:55:05.775Z
Normative uncertainty in Newcomb's problem 2013-06-16T02:16:44.853Z
[Retracted] Simpson's paradox strikes again: there is no great stagnation? 2012-07-30T17:55:04.788Z
Satire of Journal of Personality and Social Psychology's publication bias 2012-06-05T00:08:27.479Z
Using degrees of freedom to change the past for fun and profit 2012-03-07T02:51:55.367Z
"The Journal of Real Effects" 2012-03-05T03:07:02.685Z
Feed the spinoff heuristic! 2012-02-09T07:41:28.468Z
Robopocalypse author cites Yudkowsky's paperclip scenario 2011-07-17T02:18:50.042Z
Follow-up on ESP study: "We don't publish replications" 2011-07-12T20:48:19.884Z
Proposal: consolidate meetup announcements before promotion 2011-05-03T01:34:26.807Z
Future of Humanity Institute hiring postdocs from philosophy, math, CS 2011-02-02T00:39:04.509Z
Future of Humanity Institute at Oxford hiring postdocs 2010-11-24T21:40:00.597Z
Probability and Politics 2010-11-24T17:02:11.537Z
Nils Nilsson's AI History: The Quest for Artificial Intelligence 2010-10-31T19:33:39.378Z
Politics as Charity 2010-09-23T05:33:57.645Z
Singularity Call For Papers 2010-04-10T16:08:00.347Z
December 2009 Meta Thread 2009-12-17T03:41:17.341Z
Boston Area Less Wrong Meetup: 2 pm Sunday October 11th 2009-10-07T21:15:14.155Z
New Haven/Yale Less Wrong Meetup: 5 pm, Monday October 12 2009-10-07T20:35:09.646Z
Open Thread: March 2009 2009-03-26T04:04:07.047Z
Don't Revere The Bearer Of Good Info 2009-03-21T23:22:50.348Z


Comment by CarlShulman on What will 2040 probably look like assuming no singularity? · 2021-05-20T13:51:30.346Z · LW · GW

There is at least one firm doing drone delivery in China and they just approved a standard for it.

Comment by CarlShulman on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) · 2021-04-18T01:53:27.638Z · LW · GW

Mainly such complete (and irreversible!) delegation to such incompetent systems being necessary or executed. If AI is so powerful that the nuclear weapons are launched on hair-trigger without direction from human leadership I expect it to not be awful at forecasting that risk.

You could tell a story where bargaining problems lead to mutual destruction, but the outcome shouldn't be very surprising on average, i.e. the AI should be telling you about it happening with calibrated forecasts.

Comment by CarlShulman on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) · 2021-04-15T05:53:46.658Z · LW · GW

The US and China might well wreck the world  by knowingly taking gargantuan risks even if both had aligned AI advisors, although I think they likely wouldn't.

But what I'm saying is really hard to do is to make the scenarios in the OP (with competition among individual corporate boards and the like) occur without extreme failure of 1-to-1 alignment (for both companies and governments). Competitive pressures are the main reason why AI systems with inadequate 1-to-1 alignment would be given long enough leashes to bring catastrophe. I would cosign Vanessa and Paul's comments about these scenarios being hard to fit with the idea that technical 1-to-1 alignment work is much less impactful than cooperative RL or the like.


In more detail, I assign a ≥10% chance to a scenario where two or more cultures each progressively diminish the degree of control they exercise over their tech, and the safety of the economic activities of that tech to human existence, until an involuntary human extinction event.  (By comparison, I assign at most around a ~3% chance of a unipolar "world takeover" event, i.e., I'd sell at 3%.)

If this means that a 'robot rebellion' would include software produced by more than one company or country, I think that that is a substantial possibility, as well as the alternative, since competitive dynamics in a world with a few giant countries and a few giant AI companies (and only a couple leading chip firms) can mean that the way safety tradeoffs work is by one party introducing rogue AI systems that outcompete by not paying an alignment tax (and intrinsically embodying in themselves astronomically valuable and expensive IP), or cascading alignment failure in software traceable to a leading company/consortium or country/alliance. 

But either way reasonably effective 1-to-1 alignment methods (of the 'trying to help you and not lie to you and murder you with human-level abilities' variety) seem to eliminate a supermajority of the risk.

[I am separately skeptical that technical work on multi-agent RL is particularly helpful, since it can be done by 1-to-1 aligned systems when they are smart, and the more important coordination problems seem to be earlier between humans in the development phase.]

Comment by CarlShulman on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) · 2021-04-12T23:07:51.292Z · LW · GW

I think I disagree with you on the tininess of the advantage conferred by ignoring human values early on during a multi-polar take-off.  I agree the long-run cost of supporting humans is tiny, but I'm trying to highlight a dynamic where fairly myopic/nihilistic power-maximizing entities end up quickly out-competing entities with other values, due to, as you say, bargaining failure on the part of the creators of the power-maximizing entities.

Right now the United States has a GDP of >$20T, US plus its NATO allies and Japan >$40T, the PRC >$14T, with a world economy of >$130T. For AI and computing industries the concentration is even greater.

These leading powers are willing to regulate companies and invade small countries based on reasons much less serious than imminent human extinction. They have also avoided destroying one another with nuclear weapons.

If one-to-one intent alignment works well enough that one's own AI will not blatantly lie about upcoming AI extermination of humanity, then superintelligent locally-aligned AI advisors will tell the governments of these major powers (and many corporate and other actors with the capacity to activate governmental action) about the likely downside of conflict or unregulated AI havens (meaning specifically the deaths of the top leadership and everyone else in all countries).

All Boards wish other Boards would stop doing this, but neither they nor their CEOs manage to strike up a bargain with the rest of the world stop it. 

Within a country, one-to-one intent alignment for government officials or actors who support the government means superintelligent advisors identify and assist in suppressing attempts by an individual AI company or its products to overthrow the government.

Internationally, with the current balance of power (and with fairly substantial deviations from it) a handful of actors have the capacity to force a slowdown or other measures to stop an outcome that will otherwise destroy them.  They (and the corporations that they have legal authority over, as well as physical power to coerce) are few enough to make bargaining feasible, and powerful enough to pay a large 'tax' while still being ahead of smaller actors. And I think they are well enough motivated to stop their imminent annihilation, in a way that is more like avoiding mutual nuclear destruction than cosmopolitan altruistic optimal climate mitigation timing.

That situation could change if AI enables tiny firms and countries to match the superpowers in AI capabilities or WMD before leading powers can block it.

So I agree with others in this thread that good one-to-one alignment basically blocks the scenarios above.

Comment by CarlShulman on Another (outer) alignment failure story · 2021-04-12T21:45:26.817Z · LW · GW

I think they are fighting each other all the time, though mostly in very prosaic ways (e.g. McDonald's and Burger King's marketing AIs are directly competing for customers). Are there some particular conflicts you imagine that are suppressed in the story?


I think the one that stands out the most is 'why isn't it possible for some security/inspector AIs to get a ton of marginal reward by whistleblowing against the efforts required for a flawless global camera grab?' I understand the scenario says it isn't because the demonstrations are incomprehensible, but why/how?

Comment by CarlShulman on "New EA cause area: voting"; or, "what's wrong with this calculation?" · 2021-02-27T21:25:06.341Z · LW · GW

That is the opposite error, where one cuts off the close election cases. The joint probability density function over vote totals is smooth because of uncertainty (which you can see from polling errors), so your chance of being decisive scales proportionally with the size of the electorate and the margin of error in polling estimation.

Comment by CarlShulman on "New EA cause area: voting"; or, "what's wrong with this calculation?" · 2021-02-27T21:21:58.035Z · LW · GW

The error is a result of assuming the coin is exactly 50%, in fact polling uncertainties mean your probability distribution over its 'weighting' is smeared over at least several percentage points. E.g. if your credence from polls/538/prediction markets is smeared uniformly from 49% to 54%, then the chance of the election being decided by a single vote is one divided by 5% of the # of voters.

You can see your assumption is wrong because it predicts that tied elections should be many orders of magnitude more common than they are. There is a symmetric error where people assume that the coin has a weighting away from 50%, so the chances of your vote mattering approach zero. Once you have a reasonable empirical distribution over voting propensities fit to reproduce actual election margins both these errors go away.

See Andrew Gelman's papers on this.

Comment by CarlShulman on The Upper Limit of Value · 2021-01-28T21:06:09.275Z · LW · GW

As I said, the story was in combination with one-boxing decision theories and our duplicate counterparts.

Comment by CarlShulman on The Upper Limit of Value · 2021-01-28T14:31:27.047Z · LW · GW

I suppose by 'the universe' I meant what you would call the inflationary multiverse, that is including distant regions we are now out of contact with. I personally tend not to call regions separated by mere distance separate universes.

"and the only impact of our actions with infinite values is the number of black holes we create."

Yes, that would be the infinite impact I had in mind, doubling the number would double the number of infinite branching trees of descendant universes.

Re simulations, yes, there is indeed a possibility of influencing other levels, although we would be more clueless, and it is a way for us to be in a causally connected patch with infinite future.

Comment by CarlShulman on The Upper Limit of Value · 2021-01-27T22:36:22.642Z · LW · GW

Thanks David, this looks like a handy paper! 

Given all of this, we'd love feedback and discussion, either as comments here, or as emails, etc.

I don't agree with the argument that infinite impacts of our choices are of Pascalian improbability, in fact I think we probably face them as a consequence of one-boxing decision theory, and some of the more plausible routes to local infinite impact are missing from the paper:

  • The decision theory section misses the simplest argument for infinite value: in an infinite inflationary universe with infinite copies of me, then my choices are multiplied infinitely. If I would one-box on Newcomb's Problem, then I would take the difference between eating the sandwich and not to be scaled out infinitely. I think this argument is in fact correct and follows from our current cosmological models combine with one-boxing decision theories.
  • Under 'rejecting physics' I didn't see any mention of baby universes, e.g. Lee Smolin's cosmological natural selection. If that picture were right, or anything else in which we can affect the occurrence of new universes/inflationary bubbles forming, then that would permit infinite impacts.
  • The simulation hypothesis is a plausible way for our physics models to be quite wrong about the world in which the simulation is conducted, and further there would be reason to think simulations would be disproportionately conducted under physical laws that are especially conducive to abundant computation
Comment by CarlShulman on What trade should we make if we're all getting the new COVID strain? · 2020-12-27T16:45:56.275Z · LW · GW

Little reaction to the new strain news, or little reaction to new strains outpacing vaccines and getting a large chunk of the population over the next several months?

Comment by CarlShulman on The Colliding Exponentials of AI · 2020-11-01T03:26:43.210Z · LW · GW

These projections in figure 4 seem to falsely assume training optimal compute scales linearly with model size. It doesn't, you also need to show more data points to the larger models so training compute grows superlinearly, as discussed in OAI scaling papers. That changes the results by orders of magnitude (there is uncertainty about which of two inconsistent scaling trends to extrapolate further out, as discussed in the papers).

Comment by CarlShulman on Rafael Harth's Shortform · 2020-08-17T21:34:13.208Z · LW · GW


Comment by CarlShulman on Are we in an AI overhang? · 2020-07-29T15:43:16.955Z · LW · GW
Maybe the real problem is just that it would add too much to the price of the car?

Yes. GPU/ASICs in a car will have to sit idle almost all the time, so the costs of running a big model on it will be much higher than in the cloud.

Comment by CarlShulman on Rafael Harth's Shortform · 2020-07-23T16:04:40.404Z · LW · GW

I'm not a utilitarian, although I am closer to that than most people (scope sensitivity goes a long way in that direction), and find it a useful framework for highlighting policy considerations (but not the only kind of relevant normative consideration).

And no, Nick did not assert an estimate of x-risk as simultaneously P and <P.

Comment by CarlShulman on Tips/tricks/notes on optimizing investments · 2020-06-05T17:44:44.891Z · LW · GW

This can prevent you from being able to deduct the interest as investment interest expense on your taxes due to interest tracing rules (you have to show the loan was not commingled with non-investment funds in an audit), and create a recordkeeping nightmare at tax time.

Comment by CarlShulman on Open & Welcome Thread - June 2020 · 2020-06-05T15:43:55.152Z · LW · GW

Re hedging, a common technique is having multiple fairly different citizenships and foreign-held assets, i.e. such that if your country become dangerously oppressive you or your assets wouldn't be handed back to it. E.g. many Chinese elites pick up a Western citizenship for them or their children, and wealthy people fearing change in the US sometimes pick up New Zealand or Singapore homes and citizenship.

There are many countries with schemes to sell citizenship, although often you need to live in them for some years after you make your investment. Then emigrate if things are starting to look too scary before emigration is restricted.

My sense, however, is that the current risk of needing this is very low in the US, and the most likely reason for someone with the means to buy citizenship to leave would just be increases in wealth/investment taxes through the ordinary political process, with extremely low chance of a surprise cultural revolution (with large swathes of the population imprisoned, expropriated or killed for claimed ideological offenses) or ban on emigration. If you take enough precautions to deal with changes in tax law I think you'll be taking more than you need to deal with the much less likely cultural revolution story.

Comment by CarlShulman on The EMH Aten't Dead · 2020-05-19T05:43:10.121Z · LW · GW
April was the stock market's best month in 30 years, which is not really what you expect during a global pandemic.

Historically the biggest short-term gains have been disproportionately amidst or immediately following bear markets, when volatility is highest.

Comment by CarlShulman on The EMH Aten't Dead · 2020-05-18T17:02:26.722Z · LW · GW

Sure, it's part of how they earn money, but competition between them limits what's left, since they're bidding against each other to take the other side from the retail investor, who buys from or sells to the hedge fund offering the best deal at the time (made somewhat worse by deadweight losses from investing in speed).

Comment by CarlShulman on The EMH Aten't Dead · 2020-05-18T17:00:49.619Z · LW · GW
It doesn't suggest that. Factually, we know that a majority of investors underperform indexes.

Absolutely, I mean that when you break out the causes of the underperformance, you can see how much is from spending time out of the market, from paying high fees, from excessive trading to pay spreads and capital gains taxes repeatedly, from retail investors not starting with all their future earnings invested (e.g. often a huge factor in the Dalbar studies commonly cited to sell high fee mutual funds to retail investors), and how much from unwittingly identifying overpriced securities and buying them. And the last chunk is small relative to the rest.

When there's an event that will cause retail investors to predictively make bad investments some hedge fund will do high frequency trades as soon the event becomes known to be able to trade the opposite site of the trade.

I agree, active investors correcting retail investors can earn normal profits on the EMH, and certainly market makers get spreads. But competition is strong, and spreads have been shrinking, so that's much less damaging than identifying seriously overpriced stocks and buying them.

Comment by CarlShulman on The EMH Aten't Dead · 2020-05-18T02:55:50.800Z · LW · GW

Thank you, I enjoyed this post.

One thing I would add is that the EMH also suggests one can make deviations that don't have very high EMH-predicted costs. Small investors do underperform indexes a lot by paying extra fees, churning with losses to spreads and capital gains taxes, spending time out of the market, and taking too much or too little over risk (and especially too much uncompensated risk from under diversification). But given the EMH they also can't actively pick equities with large expected underperformance. Otherwise, a hedge fund could make huge profits by just doing the opposite (they compete the mispricing down to a level where they earn normal profits). Reversed stupidity is not intelligence. [Edited paragraph to be clear that typical retail investors do severely underperform, just mainly for reasons other than uncanny ability to find overpriced securities and buy them).]

That consideration makes it more attractive, if one is uncertain about an edge, to consider investments that the EMH would predict should be have very modest underperformance, but some unusual information would suggest would outperform a lot. I was persuaded to deviate from indexing after seeing high returns across several 'would-have-invested in' (or did invest a little in, registered predictions on, etc) cases of the sort Wei Dai discusses. So far doing so has been kind to my IRR vs benchmarks, but because I've only seen results across a handful of deviations (one was coronavirus-inspired market puts, inspired in part by Wei Dai and held until late March based on a prior plan of letting clear community transmission in the US become visible), and my understanding from colleagues in the pandemic space), the likelihood ratio is weak between the bottom two quadrants of your figure. I might fill in 'deluded lucky fool' in your poll. Yet I don't demand a very high credence in the good quadrant to outweigh the underdiversification costs of using these deviations as a stock-picking random number generator. That said, the bar for even that much credence in a purported edge is still very demanding.

I'd also flag that going all-in on EMH and modern financial theory still leads to fairly unusual investing behavior for a retail investor, moreso than I had thought before delving into it. E.g. taking human capital into account in portfolio design, or really understanding the utility functions and beliefs required to justify standard asset allocation advice (vs something like maximizing expected growth rate/log utility of income/Kelly criterion, without a 0 leverage constraint), or just figuring out all the tax optimization (and investment choice interactions with tax law), like the Mega Backdoor Roth, donating appreciated stock, tax loss harvesting, or personal defined benefit pension plans. So there's a lot more to doing EMH investing right than just buying a Vanguard target date fund, and I would want to encourage people to do that work regardless.

Comment by CarlShulman on Fast Takeoff in Biological Intelligence · 2020-04-27T03:03:01.255Z · LW · GW

I agree human maturation time is enough on its own to rule out a human reproductive biotech 'fast takeoff,' but also:

  • In any given year the number of new births is very small relative to the existing workforce, of billions of humans, including many people with extraordinary abilities
  • Most of those births are unplanned or to parents without access to technologies like IVF
  • New reproductive technologies are adopted gradually by risk-averse parents
  • Any radical enhancement would carry serious risks of negative surprise side effects, further reducing the user base of new tech
  • IVF is only used for a few percent of births in rich countries, and existing fancy versions are used even less frequently

All of those factors would smooth out any such application to spread out expected impacts over a number of decades, on top of the minimum from maturation times.

Comment by CarlShulman on 2019 AI Alignment Literature Review and Charity Comparison · 2020-01-06T21:18:32.969Z · LW · GW
MIRI researchers contributed to the following research led by other organisations
MacAskill & Demski's A Critique of Functional Decision Theory

This seems like a pretty weird description of Demski replying to MacAskill's draft.

Comment by CarlShulman on Does GPT-2 Understand Anything? · 2020-01-03T08:00:09.743Z · LW · GW

The interesting content kept me reading, but it would help the reader to have lines between paragraphs in the post.

Comment by CarlShulman on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T21:12:09.173Z · LW · GW

I have launch codes and don't think this is good. Specifically, I think it's bad.

Comment by CarlShulman on Why so much variance in human intelligence? · 2019-08-23T23:21:25.592Z · LW · GW

A mouse brain has ~75 million neurons, a human brain ~85 billion neurons. The standard deviation of human brain size is ~10%. If we think of that as a proportional increase rather than an absolute increase in the # of neurons, that's ~74 standard deviations of difference. The correlation between # of neurons and IQ in humans is ~0.3, but that's still a massive difference. Total neurons/computational capacity does show a pattern somewhat like that in the figure. Chimps' brains are a factor of ~3x smaller than humans, ~12 standard deviations.

Selection can cumulatively produce gaps that are large relative to intraspecific variation (one can see the same relationships even more blatantly considering total body mass). Mice do show substantial variation in maze performance, etc.

And the cumulative cognitive work that has gone into optimizing the language, technical toolkit, norms, and other factors involved in human culture and training into are immensely beyond those of mice (and note that human training of animals can greatly expand the set of tasks they can perform, especially with some breeding to adjust their personalities to be more enthusiastic about training). Humans with their language abilities can properly interface with that culture, dwarfing the capabilities both of small animals and people in smaller earlier human cultures with less accumulated technology or economies of scale.

Hominid culture took off enabled by human capabilities [so we are not incredibly far from the minimum need for strongly accumulating culture, the selection effect you reference in the post], and kept rising over hundreds of thousands and millions of years, at accelerating pace as the population grew with new tech, expediting further technical advance. Different regions advanced at different rates (generally larger connected regions grew faster, with more innovators to accumulate innovations), but all but the smallest advanced. So if humans overall had lower cognitive abilities there would be slack for technological advance to have happened anyway, just at slower rates (perhaps manyfold), accumulating more by trial and error.

Human individual differences are also amplified by individual control over environments, e.g. people who find studying more congenial or fruitful study more and learn more.

Comment by CarlShulman on Tal Yarkoni: No, it's not The Incentives—it's you · 2019-07-25T06:02:25.700Z · LW · GW

Survey and other data indicate that in these fields most people were doing p-hacking/QRPs (running tests selected ex post, optional stopping, reporting and publication bias, etc), but a substantial minority weren't, with individual, subfield, and field variation. Some people produced ~100% bogus work while others were ~0%. So it was possible to have a career without the bad practices Yarkoni criticizes, aggregating across many practices to look at overall reproducibility of research.

And he is now talking about people who have been informed about the severe effects of the QRPs (that they result in largely bogus research at large cost to science compared to reproducible alternatives that many of their colleagues are now using and working to reward) but choose to continue the bad practices. That group is also disproportionately tenured, so it's not a question of not getting a place in academia now, but of giving up on false claims they built their reputation around and reduced grants and speaking fees.

I think the core issue is that even though the QRPs that lead to mostly bogus research in fields such as social psych and neuroimaging often started off without intentional bad conduct, their bad effects have now become public knowledge, and Yarkoni is right to call out those people on continuing them and defending continuing them.

Comment by CarlShulman on Unconscious Economics · 2019-03-28T02:40:26.775Z · LW · GW

There is a literature on firm productivity showing large firm variation in productivity and average productivity growth by expansion of productive firms relative less productive firms. E.g. this , this , this , and this.

Comment by CarlShulman on What failure looks like · 2019-03-27T19:25:24.625Z · LW · GW

OK, thanks for the clarification!

My own sense is that the intermediate scenarios are unstable: if we have fairly aligned AI we immediately use it to make more aligned AI and collectively largely reverse things like Facebook click-maximization manipulation. If we have lost the power to reverse things then they go all the way to near-total loss of control over the future. So i would tend to think we wind up in the extremes.

I could imagine a scenario where there is a close balance among multiple centers of AI+human power, and some but not all of those centers have local AI takeovers before the remainder solve AI alignment, and then you get a world that is a patchwork of human-controlled and autonomous states, both types automated. E.g. the United States and China are taken over by their AI systems (inlcuding robot armies), but the Japanese AI assistants and robot army remain under human control and the future geopolitical system keeps both types of states intact thereafter.

Comment by CarlShulman on What failure looks like · 2019-03-27T04:09:56.362Z · LW · GW
Failure would presumably occur before we get to the stage of "robot army can defeat unified humanity"---failure should happen soon after it becomes possible, and there are easier ways to fail than to win a clean war. Emphasizing this may give people the wrong idea, since it makes unity and stability seem like a solution rather than a stopgap. But emphasizing the robot army seems to have a similar problem---it doesn't really matter whether there is a literal robot army, you are in trouble anyway.

I agree other powerful tools can achieve the same outcome, and since in practice humanity isn't unified rogue AI could act earlier, but either way you get to AI controlling the means of coercive force, which helps people to understand the end-state reached.

It's good to both understand the events by which one is shifted into the bad trajectory, and to be clear on what the trajectory is. It sounds like your focus on the former may have interfered with the latter.

Comment by CarlShulman on What failure looks like · 2019-03-27T04:02:42.168Z · LW · GW
I think we can probably build systems that really do avoid killing people, e.g. by using straightforward versions of "do things that are predicted to lead to videos that people rate as acceptable," and that at the point when things have gone off the rails those videos still look fine (and to understand that there is a deep problem at that point you need to engage with complicated facts about the situation that are beyond human comprehension, not things like "are the robots killing people?"). I'm not visualizing the case where no one does anything to try to make their AI safe, I'm imagining the most probable cases where people fail.

Haven't you yourself written about the failure modes of 'do things predicted to lead to videos that people rate as acceptable' where the attack involves surreptitiously reprogramming the camera to get optimal videos (including weird engineered videos designed to optimize on infelicities in the learned objective? E.g. in your discussion of an unaligned benchmark here:

We train an autoregressive generative model, to maximize the log probability assigned to the actual sequence of actions and observations produced by the AI (with each observation conditioned on the past actions). This isn’t actually a good way to train the generative model, but it’s not really central to the discussion.
We train the reward function by showing humans sequences of actions and predicted observations, asking them to assign scores, then predicting those scores with supervised learning. We show humans the sequences of actions that look most promising to the system...
I am quite confident that “world which looks appealing on screen” is not the same as “world which is good;” the gap is already large today, and it will grow much larger over time.
Maintaining a fiction. Ultimately the goal of this AI is to make the sensors look like good things are happening in the world. This can be done either by making the world good or by carefully controlling what gets fed to the sensors. In general, there are many fewer constraints on “what gets fed to the sensors” than “what the world is actually like,” and so it’s going to be possible to produce much more desirable-looking outcomes, much more easily, by controlling the sensors.
If the humans understand what the AI is doing in detail then they can observe the sequence of actions to notice any actions that might corrupt the sensors. But as soon as the AI is engaging in complicated plans this appears to be totally intractable. Even if the AI was human-level, there are many extremely hard-to-detect plans: for example, the AI could collaborate with an external attacker, who will surreptitiously corrupt the sensor. Or the AI could create a copy of itself on an external server who then helped orchestrate a hard-to-detect corruption.

Are you now very confident that no AI company would implement something with this vulnerability? Or does the paragraph below just mean that the part where the AI gets control of the camera has to happen before the robot armies (or even robot armies identified as traceable to the AI in question), which then happen?

Part I has this focus because (i) that's where I think the action is---by the time you have robot armies killing everyone the ship is so sailed, I think a reasonable common-sense viewpoint would acknowledge this by reacting with incredulity to the "robots kill everyone" scenario, and would correctly place the "blame" on the point where everything got completely out of control even though there weren't actually robot armies yet (ii) the alternative visualization leads people to seriously underestimate the difficulty of the alignment problem, (iii) I was trying to describe the part of the picture which is reasonably accurate regardless of my views on the singularity.

Because it definitely seems that Vox got the impression from it that there is never a robot army takeover in the scenario, not that it's slightly preceded by camera hacking.

Is the idea that the AI systems develops goals over the external world (rather than the sense inputs/video pixels) so that they are really pursuing the appearance of prosperity, or corporate profits, and so don't just wirehead their sense inputs as in your benchmark post?

Comment by CarlShulman on What failure looks like · 2019-03-26T22:15:14.270Z · LW · GW

I think the kind of phrasing you use in this post and others like it systematically misleads readers into thinking that in your scenarios there are no robot armies seizing control of the world (or rather, that all armies worth anything at that point are robotic, and so AIs in conflict with humanity means military force that humanity cannot overcome). I.e. AI systems pursuing badly aligned proxy goals or influence-seeking tendencies wind up controlling or creating that military power and expropriating humanity (which eventually couldn't fight back thereafter even if unified).

E.g. Dylan Matthews' Vox writeup of the OP seems to think that your scenarios don't involve robot armies taking control of the means of production and using the universe for their ends against human objections or killing off existing humans (perhaps destructively scanning their brains for information but not giving good living conditions to the scanned data):

Even so, Christiano’s first scenario doesn’t precisely envision human extinction. It envisions human irrelevance, as we become agents of machines we created.
Human reliance on these systems, combined with the systems failing, leads to a massive societal breakdown. And in the wake of the breakdown, there are still machines that are great at persuading and influencing people to do what they want, machines that got everyone into this catastrophe and yet are still giving advice that some of us will listen to.

The Vox article also mistakes the source of influence-seeking patterns to be about social influence rather than systems that try to increase in power and numbers tend to do so, so are selected for if we accidentally or intentionally produce them and don't effectively weed them out; this is why living things are adapted to survive and expand; such desires motivate conflict with humans when power and reproduction can be obtained by conflict with humans, which can look like robot armies taking control.takes the point about influence-seeking patterns to be about. That seems to me just a mistake about the meaning of influence you had in mind here:

Often, he notes, the best way to achieve a given goal is to obtain influence over other people who can help you achieve that goal. If you are trying to launch a startup, you need to influence investors to give you money and engineers to come work for you. If you’re trying to pass a law, you need to influence advocacy groups and members of Congress.
That means that machine-learning algorithms will probably, over time, produce programs that are extremely good at influencing people. And it’s dangerous to have machines that are extremely good at influencing people.

Comment by CarlShulman on Act of Charity · 2018-12-18T20:14:49.885Z · LW · GW

There's an enormous difference between having millions of dollars of operating expenditures in an LLC (so that an org is legally allowed to do things like investigate non-deductible activities like investment or politics), and giving up the ability to make billions of dollars of tax-deductible donations. Open Philanthropy being an LLC (so that its own expenses aren't tax-deductible, but it has LLC freedom) doesn't stop Good Ventures from making all relevant donations tax-deductible, and indeed the overwhelming majority of grants on its grants page are deductible.

Comment by CarlShulman on Two Neglected Problems in Human-AI Safety · 2018-12-18T17:38:24.320Z · LW · GW

I think this is under-discussed, but also that I have seen many discussions in this area. E.g. I have seen it come up and brought it up in the context of Paul's research agenda, where success relies on humans being able to play their part safely in the amplification system. Many people say they are more worried about misuse than accident on the basis of the corruption issues (and much discussion about CEV and idealization, superstimuli, etc addresses the kind of path-dependence and adversarial search you mention).

However, those varied problems mostly aren't formulated as 'ML safety problems in humans' (I have seen robustness and distributional shift discussion for Paul's amplification, and daemons/wireheading/safe-self-modification for humans and human organizations), and that seems like a productive framing for systematic exploration, going through the known inventories and trying to see how they cross-apply.

Comment by CarlShulman on "Artificial Intelligence" (new entry at Stanford Encyclopedia of Philosophy) · 2018-07-19T19:59:26.860Z · LW · GW

No superintelligent AI computers, because they lack hypercomputation.

Comment by CarlShulman on "Artificial Intelligence" (new entry at Stanford Encyclopedia of Philosophy) · 2018-07-19T19:45:47.604Z · LW · GW

Another Bringsjord classic :

> However, we give herein a novel, formal modal argument showing that since it's mathematically possible that human minds are hypercomputers, such minds are in fact hypercomputers.

Comment by CarlShulman on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-07-03T18:54:19.215Z · LW · GW

That's what the congenital deafness discussion was about.

You have preferences over pain and pleasure intensities that you haven't experienced, or new durations of experiences you know. Otherwise you wouldn't have anything to worry about re torture, since you haven't experienced it.

Consider people with pain asymbolia:

Pain asymbolia is a condition in which pain is perceived, but with an absence of the suffering that is normally associated with the pain experience. Individuals with pain asymbolia still identify the stimulus as painful but do not display the behavioral or affective reactions that usually accompany pain; no sense of threat and/or danger is precipitated by pain.

Suppose you currently had pain asymbolia. Would that mean you wouldn't object to pain and suffering in non-asymbolics? What if you personally had only happened to experience extremely mild discomfort while having lots of great positive experiences? What about for yourself? If you knew you were going to get a cure for your pain asymbolia tomorrow would you object to subsequent torture as intrinsically bad?

We can go through similar stories for major depression and positive mood.

Seems it's the character of the experience that matters.

Likewise, if you've never experienced skiing, chocolate, favorite films, sex, victory in sports, and similar things that doesn't mean you should act as though they have no moral value. This also holds true for enhanced experiences and experiences your brain currently is unable to have, like the case of congenital deafness followed by a procedure to grant hearing and listening to music.

Comment by CarlShulman on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-07-02T06:38:40.397Z · LW · GW

"My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making?"

I think with current tech it's cheaper and easier to wirehead to increase pain (i.e. torture) than to increase pleasure or reduce pain. This makes sense biologically, since organisms won't go looking for ways to wirehead to maximize their own pain, evolution doesn't need to 'hide the keys' as much as with pleasure or pain relief (where the organism would actively seek out easy means of subverting the behavioral functions of the hedonic system). Thus when powerful addictive drugs are available, such as alcohol, human populations evolve increased resistance over time. The sex systems evolve to make masturbation less rewarding than reproductive sex under ancestral conditions, desire for play/curiosity is limited by boredom, delicious foods become less pleasant when full or the foods are not later associated with nutritional sensors in the stomach, etc.

I don't think this is true with fine control over the nervous system (or a digital version) to adjust felt intensity and behavioral reinforcement. I think with that sort of full access one could easily increase the intensity (and ease of activation) of pleasures/mood such that one would trade them off against the most intense pains at ~parity per second, and attempts at subjective comparison when or after experiencing both would put them at ~parity.

People will willingly undergo very painful jobs and undertakings for money, physical pleasures, love, status, childbirth, altruism, meaning, etc. Unless you have a different standard for the 'boxes' than used in subjective comparison with rich experience of the things to be compared I think we just haggling over the price re intensity.

We know the felt caliber and behavioral influence of such things can vary greatly. It would be possible to alter nociception and pain receptors to amp up or damp down any particular pain. This could even involve adding a new sense, e.g. someone with congenital deafness could be given the ability to hear (installing new nerves and neurons), and hear painful sounds, with artificially set intensity of pain. Likewise one could add a new sense (or dial one up) to enable stronger pleasures. I think that both the new pains and new pleasures would 'count' to the same degree (and if you're going to dismiss the pleasures as 'wireheading' then you should dismiss the pains too).

" For example, I'd strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?"

You trade off pain and pleasure in your own life, are you saying that the standard would be different for the boxes than for yourself?

What are you using as the examples to represent the boxes, and have you experienced them? (As discussed in my link above, people often use weaksauce examples in such comparison.)

Comment by CarlShulman on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-07-01T18:50:01.683Z · LW · GW

"one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us"

Comparing pains and pleasures of similar magnitude? People have a tendency not to do this, see the linked thread.

"Another sign is that pain is an internal experience, while our values might refer to the external world (though it's very murky"

You accept pain and risk of pain all the time to pursue various pleasures, desires and goals. Mice will cross electrified surfaces for tastier treats.

If you're going to care about hedonic states as such, why treat the external case differently?

Alternatively, if you're going to dismiss pleasure as just an indicator of true goals (e.g. that pursuit of pleasure as such is 'wireheading') then why not dismiss pain in the same way, as just a signal and not itself a goal?

Comment by CarlShulman on Increasing GDP is not growth · 2017-03-02T01:55:45.657Z · LW · GW

I meant GWP without introducing the term. Edited for clarity.

Comment by CarlShulman on Increasing GDP is not growth · 2017-02-19T20:28:29.500Z · LW · GW

If you have a constant population, and GDP increases, productivity per person has increased. But if you have a border on a map enclosing some people, and you move it so it encloses more people, productivity hasn't increased.

Can you give examples of people confirmed to be actually making the mistake this post discusses? I don't recall seeing any.

The standard economist claim (and the only version I've seen promulgated in LW and EA circles) is that it increases gross world product (total and per capita) because migrants are much more productive when they migrate to developed countries. Here is a set of references and counterarguments.

Separately, some people are keen to increase GDP in particular countries to pay off national fixed costs (like already incurred debts, or military spending).

Comment by CarlShulman on Claim explainer: donor lotteries and returns to scale · 2016-12-31T01:17:33.114Z · LW · GW

I came up with the idea and basic method, then asked Paul if he would provide a donor lottery facility. He did so, and has been taking in entrants and solving logistical issues as they come up.

I agree that thinking/researching/discussing more dominates the gains in the $1-100k range.

Comment by CarlShulman on Optimizing the news feed · 2016-12-02T00:26:05.112Z · LW · GW

A different possibility is identifying vectors in Facebook-behavior space, and letting users alter their feeds accordingly, e.g. I might want to see my feed shifted in the direction of more intelligent users, people outside the US, other political views, etc. At the individual level, I might be able to request a shift in my feed in the direction of individual Facebook friends I respect (where they give general or specific permission).

Comment by CarlShulman on Synthetic supermicrobe will be resistant to all known viruses · 2016-11-24T05:08:50.863Z · LW · GW

That advantage only goes so far:

  • Plenty of nonviral bacteria-eating entities exist, and would become more numerous
  • Plant and antibacterial defenses aren't viral-based
  • For the bacteria to compete in the same niche as unmodified versions it has to fulfill a similar ecological role: photosynthetic cyanobacteria with altered DNA would still produce oxygen and provide food
  • It couldn't benefit from exchanging genetic material with other kinds of bacteria
Comment by CarlShulman on Astrobiology III: Why Earth? · 2016-10-07T00:19:07.058Z · LW · GW

Primates and eukaryotes would be good.

Comment by CarlShulman on Quick puzzle about utility functions under affine transformations · 2016-07-16T17:35:24.009Z · LW · GW

Your example has 3 states: vanilla, chocolate, and neither.

But you only explicitly assigned utilities to 2 of them, although you implicitly assigned the state of 'neither' a utility of 0 initially. Then when you applied the transformation to vanilla and chocolate you didn't apply it to the 'neither' state, which altered preferences for gambles over both transformed and untransformed states.

E.g. if we initially assigned u(neither)=0 then after the transformation we have u(neither)=4, u(vanilla)=7, u(chocolate)=12. Then an action with a 50% chance of neither and 50% chance of chocolate has expected utility 8, while the 100% chance of vanilla has expected utility 7.

Comment by CarlShulman on A toy model of the control problem · 2015-09-18T15:31:48.936Z · LW · GW

Maybe explain how it works when being configured, and then stops working when B gets a better model of the situation/runs more trial-and-error trials?

Comment by CarlShulman on A toy model of the control problem · 2015-09-17T19:15:31.812Z · LW · GW

An illustration with a game-playing AI, see 15:50 and after in the video. The system has a reward function based on bytes in memory, which leads it to pause the game forever when it is about to lose.

Comment by CarlShulman on A toy model of the control problem · 2015-09-17T18:02:50.614Z · LW · GW

That still involves training it with no negative feedback error term for excess blocks (which would overwhelm a mere 0.1% uncertainty).

Comment by CarlShulman on A toy model of the control problem · 2015-09-17T03:02:48.466Z · LW · GW

Of course, with this model it's a bit of a mystery why A gave B a reward function that gives 1 per block, instead of one that gives 1 for the first block and a penalty for additional blocks. Basically, why program B with a utility function so seriously out of whack with what you want when programming one perfectly aligned would have been easy?