Posts
Comments
I don't mean this as a criticism - you can both be right - but this is extremely correlated to the updates made by the average Bay Area x-risk reduction-enjoyer over the past 5-10 years, to the extent that it almost could serve as a summary.
It may be useful to know that if events all obey the Markov property (they are probability distributions, conditional on some set of causal parents), then the Reichenbach Common Cause Principle follows (by d-separation arguments) as a theorem. So any counterexamples to RCCP must violate the Markov property as well.
There's also a lot of interesting discussion here.
The idea that "Agents are systems that would adapt their policy if their actions influenced the world in a different way." works well on mechanised CIDs whose variables are neatly divided into object-level and mechanism nodes: we simply check for a path from a utility function F_U to a policy Pi_D. But to apply this to a physical system, we would need a way to obtain such a partition those variables. Specifically, we need to know (1) what counts as a policy, and (2) whether any of its antecedents count as representations of "influence" on the world (and after all, antecedents A of the policy can only be 'representations' of the influence, because in the real world, the agent's actions cannot influence themselves by some D->A->Pi->D loop). Does a spinal reflex count as a policy? Does an ant's decision to fight come from a representation of a desire to save its queen? How accurate does its belief about the forthcoming battle have to be before this representation counts? I'm not sure the paper answers these questions formally, nor am I sure that it's even possible to do so. These questions don't seem to have objectively right or wrong answers.
So we don't really have any full procedure for "identifying agents". I do think we gain some conceptual clarity. But on my reading, this clear definition serves to crystallise how hard it is to identify agents, moreso than it shows practically how it can be done.
(NB. I read this paper months ago, so apologies if I've got any of the details wrong.)
Nice. I've previously argued similarly that if going for tenure, AIS researchers might places that are strong in departments other than their own, for inter-departmental collaboration. This would have similar implications to your thinking about recruiting students from other departments. But I also suggested we should favour capital cities, for policy input, and EA hubs, to enable external collaboration. But tenure may be somewhat less attractive for AIS academics, compared to usual, in that given our abundant funding, we might have reason to favour Top-5 postdocs over top-100 tenure.
Feature suggestion. Using highlighting for higher-res up/downvotes and (dis)agreevotes.
Sometimes you want to indicate what part of a comment you like or dislike, but can't be bothered writing a comment response. In such cases, it would be nice if you could highlight the portion of text that you like/dislike, and for LW to "remember" that highlighting and show it to other users. Concretely, when you click the like/dislike button, the website would remember what text you had highlighted within that comment. Then, if anyone ever wants to see that highlighting, they could hover their mouse over the number of likes, and LW would render the highlighting in that comment.
The benefit would be that readers can conveniently give more nuanced feedback, and writers can have a better understanding of how readers feel about their content. It would stop this nagging wrt "why was this downvoted", and hopefully reduce the extent to which people talk past each other when arguing.
The title suggests (weakly perhaps) that the estimates themselves peer-reviewed. Would be clearer to write "building on" peer reviewed argument, or similar.
Hi Orellanin,
In the early stages, I had in mind that the more info any individual anon-account revealed, the more easily one could infer what time they spent at Leverage, and therefore their identity. So while I don't know for certain, I would guess that I created anonymoose to disperse this info across two accounts.
When I commented on the Basic Facts post as anonymoose, It was not my intent to contrive a fake conversation between two entities with separate voices. I think this is pretty clear from anonymoose's comment, too - it's in the same bulleted and dry format that throwaway uses, so it's an immediate possibility that throwaway and anonymoose are one and the same. I don't know why I used anonymoose there. Maybe due to carelessness, or maybe because I lost access to throwaway. (I know that at one time, an update to the forum login interface did rob me of access to my anon-account, but not sure if this was when that happened).
"A Russian nuclear strike would change the course of the conflict and almost certainly provoke a "physical response" from Ukraine's allies and potentially from the North Atlantic Treaty Organization, a senior NATO official said on Wednesday.
Any use of nuclear weapons by Moscow would have "unprecedented consequences" for Russia, the official said on the eve of a closed-door meeting of NATO's nuclear planning group on Thursday.
Speaking on condition of anonymity, he said a nuclear strike by Moscow would "almost certainly be drawing a physical response from many allies, and potentially from NATO itself". "-Reuters
https://news.yahoo.com/russian-nuclear-strike-almost-certainly-144246235.html"
I have heard of talk that the US might instead arm Ukraine with tactical nukes of its own, although I think that would be at least comparably risky as military retaliation.
The reasoning is that retaliating is US doctrine - they generally respond to hostile actions in-kind, to deter them. If Ukraine got nuked, the level of outrage would place intense pressure on Biden to do something, and the hawks would become a lot louder than the doves, similar to after the 9/11 attacks. In the case of Russia, the US has exhausted most non-military avenues already. And US is a very militaristic country - they have many times bombed countries (Syria, Iraq, Afghanistan, Libya) for much less. So military action just seems very likely. (Involving all of NATO or not, as michel says.)
I think your middle number is clearly too low. The risk scenario does not require that NATO trigger article 5 necessarily, but just that they carry out a strategically significant military response, like eliminating Russia's Black Sea Fleet, nuking, or creating a no-fly zone. And Max's 80% makes more sense than your 50% for he union of these possibilities, because it is hard to imagine that the US would stand down without penalising the use of nukes.
I would be at maybe .2*.8*.15=.024 for this particular chain of events leading to major US-Russia nuclear war.
All of these seem to be good points, although I haven't given up on liquidity subsidy schemes yet.
Some reports are not publicised in order not to speed up timelines. And ELK is a bit rambly - I wonder if it will get subsumed by much better content within 2yr. But I do largely agree.
It would be useful to have a more descriptive title, like "Chinchilla's implications for data bottlenecks" or something.
It's noteworthy that the safety guarantee relies on the "hidden cost" (:= proxy_utility - actual_utility) of each action being bounded above. If it's unbounded, then the theoretical guarantee disappears.
For past work on causal conceptions of corrigibility, you should check out this by Jessica Taylor. Quite similar.
It seems like you're saying that the practical weakness of forecasters vs experts is their inability to make numerous causal forecasts. Personally, I think the causal issue is the main issue, whereas you think it is that the predictions are so numerous. But they are not always numerous - sometimes you can affect big changes by intervening at a few pivot points, such as at elections. And the idea that you can avoid dealing with causal interventions by conditioning on every parent is usually not practical, because conditioning on every parent/confounder means that you have to make too many predictions, whereas you can just run one RCT.
You could test this to some extent by asking the forecasters to predict more complicated causal questions. If they lose most of their edge, then you may be right.
I don't think the capital being locked up is such a big issue. You can just invest everyone's money in bonds, and then pay the winner their normal return multiplied by the return of the bonds.
A bigger issue is that you seem to only be describing conditional prediction markets, rather than ones that truly estimate causal quantities, like P(outcome|do(event)). To see this, note that the economy will go down IF Biden is elected, whereas it is not decreased much by causing Biden to be elected. The issue is that economic performance causes Biden to be unpopular to a much greater extent than Biden shapes the economy. To eliminate confounders, you need to randomiser the action (the choice of president), or deploy careful causal identification startegies (such as careful regression discontinuity analysis, or controlling for certain variables, given knowledge of the causal structure of the data generating process). I discuss this a little more here.
I would do thumbs up/down for good/bad, and tick/cross for correct/incorrect.
What do you want to spend most of your time on? What do you think would be the most useful things to spend most of your time on (from a longtermist standpoint)?
You say two things that seem in conflict with one another.
[Excerpt 1] If a system is well-described by a causal diagram, then it satisfies a complex set of statistical relationships. For example ... To an evidential decision theorist, these kinds of statistical relationships are the whole story about causality, or at least about its relevance to decisions.
[Excerpt 2] [Suppose] that there is a complicated causal diagram containing X and Y, such that my beliefs satisfy all of the statistical relationships implied by that causal diagram. EDT recommends maximizing the conditional expectation of Y, conditioned on all the inputs to X. [emphasis added]
In [1], you say that the EDT agent only cares about the statistical relationships between variables, i.e. P(V) over the set of variables V in a Bayes net - a BN that apparently need not even be causal - nothing more.
In [2], you say that the EDT agent needs to know the parents of X. This indicates that the agent needs to know something that is not entailed by P(V), and something that is apparently causal.
Maybe you want the agent to know some causal relationships, i.e. the relationships with decision-parents, but not others?
Under these conditions, it’s easy to see that intervening on X is the same as conditioning on X.
This is true for decisions that are in the support, given the assignment to the parents, but not otherwise. CDT can form an opinion about actions that "never happen", whereas EDT cannot.
Many people don't realize how effective migraine treatments are. High-dose aspirin, tryptans, and preventers all work really well, and can often reduce migraine severity by 50-90%.
Also, most don't yet realise how effective semaglutide is for weight loss, due to the fact that weight loss drugs have generally been much less effective, or had much worse side-effects previously.
Balding treatments (finasteride and topical minoxodil) are also pretty good for a lot of people.
Another possibility is that most people were reluctant to read, summarise, or internalise Putin's writing on Ukraine due to finding it repugnant, because they aren't decouplers.
Off the top of my head, maybe it's because Metaculus' presents medians, and the median user neither investigates the issue much, nor trusts those who do (Matt Y, Scott A) and just roughly follows base rates. I also feel there was some wishful thinking, and that to some extent, the fullness of the invasion was at least somewhat intrinsically surprising.
Nice idea. But if you set C at like 10% of the correct price, then you're going to sell 90% of the visas on the first day for way too cheap, so you can lose almost all of the market surplus.
Yeah I think in practice auctioning every day or two would be completely adequate - that's much less than the latency involved in dealing with lawyers and other aspects of the process. So now I'm mostly just curious about whether there's a theory built up for these kinds of problems in the continuous time case.
My feeble attempts here.
Yes. And, the transformer-based WordTune is complementary - better for copyediting/rephrasing, rather than narrow grammatical correctness.
We do not have a scientific understanding of how to tell a superintelligent machine to "solve problem X, without doing something horrible as a side effect", because we cannot describe mathematically what "something horrible" actually means to us...
Similar to how utility theory (from von Neumann and so on) is excellent science/mathematics despite our not being able to state what utility is. AI Alignment hopes to tell us how to align AI, not the target to aim for. Choosing the target is also a necessary task, but it's not the focus of the field.
In terms of trying to formulate rigorous and consistent definitions, a major goal of the Causal Incentives Working Group is to analyse features of different problems using consistent definitions and a shared framework. In particular, our paper "Path-specific Objectives for Safer Agent Incentives" (AAAI-2022) will go online in about month, and should serve to organize a handful of papers in AIS.
Exactly. Really, the title should be "Six specializations makes you world-class at a combination of skills that is probably completely useless." Really, productivity is a function of your skills. The fact that you are "world class" in a random combination of skills is only interesting if people are systematically under-estimating the degree to which random skills can be usefully combined. If there are reasons to believe that, then I would be interested in reading about it.
Transformer models (like GPT-3) are generators of human-like text, so they can be modeled as quantilizers. However, any quantiliser guarantees are very weak, because they quantilise with very low q, equal to the likelihood that a human would generate that prompt.
The most plausible way out seems to be for grantmakers to grant money conditionally on work being published as open source. Some grantmakers may benefit from doing this, despite losing some publication prestige, because the funded work will be read more widely, and the grantmaker will look like they are improving the scientific process. Researchers lose some prestige, but gain some funding. Not sure how well this has worked so far, but perhaps we could get to the world where this works, if we're not already there.
It would be useful to have a clarification of these points, to know how different of an org you actually encountered, compared to the one I did when I (briefly) visited in 2014.
It is not true that people were expected to undergo training by their manager.
OK, but did you have any assurance that the information from charting was kept confidential from other Leveragers? I got the impression Geoff charted people who he raised money from, for example, so it at least raises the question whether information gleaned from debugging might be discussed with that person's manager.
“being experimented on” was not my primary purpose in joining nor would I now describe it as a main focus of my time at Leverage.
OK, but would you agree that a primary activity of leverage was to do psych/sociology research, and a major (>=50%) methodology for that was self-experimentation?
I did not find the group to be overly focused on “its own sociology.”
OK, but would you agree that at least ~half of the group spent at least ~half of their time studying psychology and/or sociology, using the group as subjects?
The stated purpose of Leverage 1.0 was not to literally take over the US and/or global governance or “take over the world,”...OPs claim is false.
OK, but you agree that it was was to ensure "global coordination" and "the impossibility of bad governments", per the plan, right? Do you agree that "the vibe was 'take over the world'", per the OP?
I did not believe or feel pressured to believe that Leverage was “the only organization with a plan that could possibly work.”
OK, but would you agree that many staff said this, even if you personally didn't feel pressured to take the belief on?
I did not find “Geoff’s power and prowess as a leader [to be] a central theme.”
OK, but did you notice staff saying that he was one of the great theorists of our time? Or that a significant part of the hope for the organisation was to deploy adapt certain ideas of his, like connection theory, which "solved psychology" to deal with cases with multiple individuals, in order to design larger orgs, memes, etc?
Hopefully, the answers to these questions could be mostly-separated from our subjective impressions. Which might sound harsh, or resembling a cross-examination. But it seems necessary in order to figure out to what extent we can reach a shared understanding of "common knowledge facts", at least about different moments in LR's history (potentially also differing in our interpretations), versus the facts themselves actually being contested.
Thanks for your courage, Zoe!
Personally, I've tried to maintain anonymity in online discussion of this topic for years. I dipped my toe into openly commenting last week, and immediately received an email that made it more difficult to maintain anonymity - I was told "Geoff has previously speculated to me that you are 'throwaway', the author of the 2018 basic facts post". Firstly, I very much don't appreciate my ability to maintain anonymity being narrowed like this. Rather, anonymity is a helpful defense in any sensitive online discussion, not least this one. But yes, throwaway/anonymoose is me - I posted anonymously so as to avoid adverse consequences from friends who got more involved than me. But I'm not throwaway2, anonymous, or BayAreaHuman - those three are bringing evidence that is independent from me at least.
I only visited Leverage for a couple months, back in 2014. One thing that resonated strongly with me about your post is that the discussion is badly confused by lack of public knowledge and strong narratives, about whether people are too harsh on Leverage, what biases one might have, and so on. This is why I think we often retreat to just stating "basic" or "common knowledge" facts; the facts cut through the spin.
Continuing in that spirit, I personally can attest that much of what you have said is true, and the rest congruent with the picture I built up there. They dogmatically viewed human nature as nearly arbitrarily changeable. Their plan was to study how to change their psychology, to turn themselves into Elon Musk type figures, to take over the world. This was going to work because Geoff was a legendary theoriser, Connection Theory had "solved psychology", and the resulting debugging tools were exceptionally powerful. People "worked" for ~80 hours a week - which demonstrated the power of their productivity coaching.
Power asymmetries and insularity were present to at least some degree. I personally didn't encounter an NDA, or talk of "demons" etc. Nor did I get a solid impression of the psychological effects on people from that short stay, though of course there must have been some.
What's frustrating about still hearing noisy debate on this topic, so many years later, is that Leverage being a really bad org seems overdetermined at this point. On the one hand, if I ranked MIRI, CFAR, CEA, FHI, and several startups I've visited, in terms of how reality-distorting they can be, Leverage would score ~9, while no other would surpass ~7. (It manages to be nontransparent and cultlike in other ways too!). While on the other hand, their productive output was... also like a 2/10? It's indefensible. But still only a fraction of the relevant information is in the open.
As you say, it'll take time for people to build common understanding, and to come to terms with what went down. I hope the cover you've offered will lead some others to feel comfortable sharing their experiences, to help advance that process.
As in, 5+ years ago, around when I'd first visited the Bay, I remember meeting up 1:1 with Geoff in a cafe. One of the things I asked, in order to understand how he thought about EA strategy, was what he would do if he wasn't busy starting Leverage. He said he'd probably start a cult, and I don't remember any indication that he was joking whatsoever. I'd initially drafted my comment as "he told me, unjokingly", except that it's a long time ago, so I don't want to give the impression that I'm quite that certain.
He's also told me, deadpan, that he would like to be starting a cult if he wasn't running Leverage.
Your comparison does a disservice to the human's sample efficiency in two ways:
- You're counting diverse data in the human's environment, but you're not comparing their performance on diverse tasks. Human's are obviously better than GPT3 at interactive tasks, walking around, etc. For either kind of fair comparison text data & task, or diverse data & task, the human has far superior sample efficiency.
- "fancy learning techniques" don't count as data. If the human can get mileage out of them, all the better for the human's sample efficiency.
So you seem to have it backwards when you say that the comparison that everyone is making is the "bad" one.
I think this becomes a lot clearer if we distinguish between total and marginal thinking. GPT-3's total sample efficiency for predicting text is poor:
- To learn to predict text, GPT-3 has to read >1000x as much text as a human can learn in their lifetime.
- To learn to win at go, AlphaGo has to play >100x times as many games as a human could play in their lifetime.
But on-the-margin, it's very sample efficient at learning to perform new text-related tasks:
- GPT-3 can learn to perform a new text-related task as easily as a human can.
Essentially, what's happened is GPT-3 is a kind-of mega-analytical-engine that was really sample inefficient to train up to its current level, but that can now be trained to do additional stuff at relatively little extra cost.
Does that resolve the sense of confusion/mystery, or is there more to it that I'm missing?
Can you clarify whether you're talking about "30% of X" i.e. 0.3*X, or "30% off X", i.e. 0.7*X?
Thanks; this language and these links are very useful.
Thanks. These algorithms seem like they would be better for passing the independence of clone alternatives criterion.
I imagine you could catch useful work with i) models of AI safety, or ii) analysis of failure modes, or something, though I'm obviously biased here.
The implication seems to be that this RFP is for AIS work that is especially focused on DL systems. Is there likely to be a future RFP for AIS research that applies equally well to DL and non-DL systems? Regardless of where my research lands, I imagine a lot of useful and underfunded research fits in the latter category.
To me, his main plausible x important claim is that performance is greatly improved by subject specialisation from age <5. The fact that many geniuses enter their fields late doesn't falsify this, since that isn't humdrum at all - barely one in a million kids specialise in that way. I think that people who enter a field such as CS at age 30, rather than at age 20 do have a mild disadvantage, maybe 0.5SD. So I wouldn't be surprised if starting at age 4, rather than at age 20 gave you another 1-2SD advantage. Of course, subjects like CS do benefit a lot from basic maths and so on, but there are a lot of other things that they don't benefit from that could easily be cut from the curriculum. If this sort of thing was true, it would quite profoundly undercut the standard EA advice to keep one's options open.
Have you considered just doing some BTC/BTC-PERP arbitrage, or betting on politics and sports? You'd probably learn what skills they're looking for, gain some of them, and make money while you're at it...
Thanks for these thoughts about the causal agenda. I basically agree with you on the facts, though I have a more favourable interpretation of how they bear on the potential of the causal incentives agenda. I've paraphrased the three bullet points, and responded in reverse order:
3) Many important incentives are not captured by the approach - e.g. sometimes an agent has an incentive to influence a variable, even if that variable does not cause reward attainment.
-> Agreed. We're starting to study "side-effect incentives" (improved name pending), which have this property. We're still figuring out whether we should just care about the union of SE incentives and control incentives, or whether SE or when, SE incentives should be considered less dangerous. Whether the causal style of incentive analysis captures much of what we care about, I think will be borne out by applying it and alternatives to a bunch of safety problems.
2) sometimes we need more specific quantities, than just D affects A.
-> Agreed. We've privately discussed directional quantities like "do(D=d) causes A=a" as being more safety-relevant, and are happy to hear other ideas.
1) eliminating all control-incentives seems unrealistic
-> Strongly agree it's infeasibile to remove CIs on all variables. My more modest goal would be to prove that for particular variables (or classes of variables) such as a shut down button, or a human's values, we can either: 1) prove how to remove control (+ side-effect) incentives, or 2) why this is impossible, given realistic assumptions. If (2), then that theoretical case could justify allocation of resources to learning-oriented approaches.
Overall, I concede that we haven't engaged much on safety issues in the last year. Partly, it's that the projects have had to fit within people's PhDs. Which will also be true this year. But having some of the framework stuff behind us, we should still be able to study safety more, and gain a sense of how addressable concerns like these are, and to what extent causal decision problems/games are a really useful ontology for AI safety.
I think moving to the country could possibly be justified despite harms to recruitment and the rationality community, but in the official MIRI explanations, the downsides are quite underdiscussed.
This list is pretty relevant too: http://culturaltribes.org/home
Interesting that about half of these "narratives" or "worldviews" are suffixed with "-ism": Malthusianism, Marxism, Georgism, effective altruism, transhumanism. But most of the (newer and less popular) rationalist narratives haven't yet been suchly named. This would be one heuristic for finding other worldviews.
More generally, if you want people to know and contrast a lot of these worldviews, it'd be useful to name them all in 1-2 words each.