Posts
Comments
I agree that the first can be framed as a meta-crux, but actually I think the way you framed it is more of an object-level forecasting question, or perhaps a strong prior on the forecasted effects of technological progress. If on the other hand you framed it more as conflict theory vs. mistake theory, then I'd say that's more on the meta level.
For the second, I agree that's for some people, but I'm skeptical of how prevalent the cosmopolitan view is, which is why I didn't include it in the post.
One final thing is that I typically didn't emphasize loss of control//superintelligence//recursive self-improvement. I didn't hide it, but I included it in a longer list of threat models
I'd be very interested to see that longer threat model list!
[Cross-commenting from the EA Forum.]
[Disclaimers: My wife Deena works with Kat as a business coach. I briefly met Kat and Emerson while visiting in Puerto Rico and had positive interactions with them. My personality is such that I have a very strong inclination to try to see the good in others, which I am aware can bias my views.]
A few random thoughts related to this post:
1. I appreciate the concerns over potential for personal retaliation, and the other factors mentioned by @Habryka and others for why it might be good to not delay this kind of post. I think those concerns and factors are serious and should definitely not be ignored. That said, I want to point out that there's a different type of retaliation in the other direction that posting this kind of thing without waiting for a response can cause: Reputational damage. As others have pointed out, many people seem to update more strongly on negative reports that come first and less on subsequent follow up rebuttals. If it turned out that the accusations are demonstrably false in critically important ways, then even if that comes to light later the reputational damage to Kat, Emerson, and Drew may now be irrevocable.
Reputation is important almost everywhere, but in my anecdotal experience reputation seems to be even more important in EA than in many other spheres. Many people in EA seem to have a very strong in-group bias towards favoring other "EAs" and it has long seemed to me that (for example) getting a grant from an EA organization often feels to be even more about having strong EA personal connections than for other places. (This is not to say that personal connections aren't important for securing other types of grants or deals or the like, and it's definitely not to say that getting an EA grant is only or even mostly about having strong EA connections. But from my own personal experience and from talking to quite a few others both in and out of EA, this is definitely how it feels to me. Note that I have received multiple EA grants in the past, and I have helped other people apply to and receive substantial EA grants.) I really don't like this sort of dynamic and I've low-key complained about it for a long time - it feels unprofessional and raises all sorts of in-group bias flags. And I think a lot of EA orgs feel like they've gotten somewhat better about this over time. But I think it is still a factor.
Additionally, it sometimes feels to me that EA Forum dynamics tend to lead to very strongly upvoting posts and comments that are critical of people or organizations, especially if they're more "centrally connected" in EA, while ignoring or even downvoting posts and comments in the other direction. I am not sure why the dynamic feels like this, and maybe I'm wrong about it really being a thing at all. Regardless, I strongly suspect that any subsequent rebuttal by Nonlinear would receive significantly fewer views and upvotes, even if the rebuttal were actually very strong.
Because of all this, I think that the potential for reputational harm towards Kat, Emerson, and Drew may be even greater than if this were in the business world or some other community. Even if they somehow provide unambiguous evidence that refutes almost everything in this post, I would not be terribly surprised if their potential to get EA funding going forward or to collaborate with EA orgs was permanently ended. In other words, I wouldn't be terribly surprised if this post spelled the end of their "EA careers" even if the central claims all turned out to be false. My best guess is that this is not the most likely scenario, and that if they provide sufficiently good evidence then they'll be most likely "restored" in the EA community for the most part, but I think there's a significant chance (say 1%-10%) that this is basically the end of their EA careers regardless of the actual truth of the matter.
Does any of this outweigh the factors mentioned by @Habryka? I don't know. But I just wanted to point out a possible factor in the other direction that we may want to consider, particularly if we want to set norms for how to deal with other such situations going forward.
2. I don't have any experience with libel law or anything of the sort, but my impression is that suing for slander over this kind of piece is very much within the range of normal responses in the business world, even if in the EA world it is basically unheard of. So if your frame of reference is the world outside of EA then suing seems at least like a reasonable response, while if your frame of reference is the EA community then maybe it doesn't. I'll let others weigh in on whether my impressions on this are correct, but I didn't notice others bring this up so I figured I'd mention it.
3. My general perspective on these kinds of things is that... well, people are complicated. We humans often seem to have this tendency to want our heroes to be perfect and our villains to be horrible. If we like someone we want to think they could never do anything really bad, and unless presented with extremely strong evidence to the contrary we'll look for excuses for their behavior so that it matches our pictures of them as "good people". And if we decide that they did do something bad, then we label them as "bad people" and retroactively reject everything about them. And if that's hard to do we suffer from cognitive dissonance. (Cf. halo effect.)
But the reality, at least in my opinion, is that things are more complicated. It's not just that there are shades of grey, it's that people can simultaneously be really good people in some ways and really bad people in other ways. Unfortunately, it's not at all a contradiction for someone to be a genuinely kind, caring, supportive, and absolutely wonderful person towards most of the people in their life, while simultaneously being a sexual predator or committing terrible crimes.
I'm not saying that any of the people mentioned in this post necessarily did anything wrong at all. My point here is mostly just to point out something that may be obvious to almost all of us, but which feels potentially relevant and probably bears repeating in any case. Personally I suspect that everybody involved was acting in what they perceived to be good faith and are / were genuinely trying to do the right thing, just that they're looking at the situation through lenses based on very different perspectives and experiences and so coming to very different conclusions. (But see my disclaimer at the beginning of this comment about my personality bias coloring my own perspective.)
Any chance we can get an Android app version?
The more I think about this post, the more I think it captures my frustrations with a large percentage of the public discourse on AI x-risks, and not just this one debate event.
You should make this a top level post so it gets visibility. I think it's important for people to know the caveats attached to your results and the limits on its implications in real-world dynamics.
When you say that you'd give different probability estimates on different days, do you think you can represent that as you sampling on different days from a probability distribution over your "true" latent credence? If yes, do you think it would be useful to try to estimate what that distribution looks like, and then report the mean or perhaps the 90% CI or something like that? So for example, if your estimate typically ranges between 33% and 66% depending on the day with a mean of say 50%, then instead of reporting what you think today (the equivalent of taking a single random sample from the distribution), maybe you could report 50% because that's your mean and/or report that your estimate typically ranges from 33% to 66%.
From a Facebook discussion with Scott Aaronson yesterday:
Yann: I think neither Yoshua nor Geoff believe that AI is going kill us all with any significant probability.
Scott: Well, Yoshua signed the pause letter, and wrote an accompanying statement about what he sees as the risk to civilization (I agree that there are many civilizational risks short of extinction). In his words: “No one, not even the leading AI experts, including those who developed these giant AI models, can be absolutely certain that such powerful tools now or in the future cannot be used in ways that would be catastrophic to society.”
Geoff said in a widely-shared recent video that it’s “not inconceivable” that AI will wipe out humanity, and didn’t offer any reassurances about it being vanishingly unlikely.
https://yoshuabengio.org/2023/04/05/slowing-down-development-of-ai-systems-passing-the-turing-test/
https://twitter.com/JMannhart/status/1641764742137016320
Yann: Scott Aaronson he is worried about catastrophic disruptions of the political, economic, and environmental systems. I don't want to speak for him, but I doubt he worries about a Yuddite-style uncontrollable "hard takeoff"
The conversation took place in the comments section to something I posted on Facebook: https://m.facebook.com/story.php?story_fbid=pfbid0qE1PYd3ijhUXVFc9omdjnfEKBX4VNqj528eDULzoYSj34keUbUk624UwbeM4nMyNl&id=100010608396052&mibextid=Nif5oz
Sometimes it's better in the long run to take a good chunk of time off to do things for fun and write or work less. Sometimes less is more. But this is very much a YMMV thing.
This is actually another related area of my research: To the extent that we cannot get people to sit down and agree on double cruxes, can we still assign some reasonable likelihoods and/or uncertainty estimates for those likelihoods? After all, we do ultimately need to make decisions here! Or if it turns out that we literally cannot use any numbers here, how do we best make decisions anyway?
I have now posted a "Half-baked AI safety ideas thread" (LW version, EA Forum version) - let me know if that's more or less what you had in mind.
Just putting in my vote for doing both broader and deeper explorations of these topics!
My impression - which I kind of hope is wrong - has been that it is much easier to get an EA grant the more you are an "EA insider" or have EA insider connections. The only EA connection that my professor has is me. On the other hand, I understand the reluctance to some degree in the case of AI safety because funders are concerned that researchers will take the money and go do capabilities research instead.
Honestly I suspect this is going to be the single largest benefit from paying Scott to work on the problem. Similarly, when I suggested in an earlier comment that we should pay other academics in a similar manner, in my mind the largest benefit of doing so is because that will help normalize this kind of research in the wider academic community. The more respected researchers there are working on the problem, the more other researchers start thinking about it as well, resulting (hopefully) in a snowball effect. Also, researchers often bring along their grad students!
Hopefully. I have a feeling it won't be so easy, but we'll see.
Yes! I actually just discussed this with one of my advisors (an expert on machine learning), and he told me that if he could get funding to do it he would definitely be interested in dedicating a good chunk of his time to researching AGI safety. (For any funders who might read this and might be interested in providing that funding, please reach out to me by email Aryeh.Englander@jhuapl.edu. I'm going to try to reach out to some potential funders next week.)
I think that there are a lot of researchers who are sympathetic to AI risk concerns, but they either lack the funding to work on it or they don't know how they might apply their area of expertise to do so. The former can definitely be fixed if there's an interest from funding organizations. The latter can be fixed in many cases by reaching out and talking to the researcher.
It also depends on your target audience. (Which is basically what you said, just in slightly different words.) If you want to get Serious Researchers to listen to you and they aren't already within the sub-sub-culture that is the rationality community and its immediate neighbors, then in many (most?) cases ranting and freaking out is probably going to be actively counterproductive to your cause. Same if you're trying to build a reputation as a Serious Researcher, with a chance that decision makers who listen to Serious Researchers might listen to you. On the other hand, if your target audience is people who already trust you or who are already in your immediate sub-sub-tribe, and you don't mind risking being labeled a crackpot by the wider world, then I can see why visibly freaking out could be helpful.
[Also, it goes without saying that not everybody agrees with Eliezer's probability-of-doom estimates. Depending on your relative probabilities it might make perfect sense to work in a random startup, have a 401k, not visibly freak out, etc.]
I'm pretty sure that's the whole purpose of having province governors and sub-kingdoms, and various systems in place to ensure loyalty. Every empire in history did this, to my knowledge. The threat of an imperial army showing up on your doorstep if you fail to comply has historically been sufficient to ensure loyalty, at least while the empire is strong.
We have a points system in our family to incentivize the kids to do their chores. But we have to regularly update the rules because it turns out that there are ways to optimize for the points that we didn't anticipate and that don't really reflect what we actually want the kids to be incentivized to do. Every time this happens I think - ha, alignment failure!
https://www.lesswrong.com/posts/XFBHXu4YNqyF6R3cv/pitching-an-alignment-softball
Alexey Turchin and David Denkenberger describe several scenarios here: https://philpapers.org/rec/TURCOG-2 (additional recent discussion in this comment thread)
Eliezer's go-to scenario (from his recent post):
The concrete example I usually use here is nanotech, because there's been pretty detailed analysis of what definitely look like physically attainable lower bounds on what should be possible with nanotech, and those lower bounds are sufficient to carry the point. My lower-bound model of "how a sufficiently powerful intelligence would kill everyone, if it didn't want to not do that" is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery. (Back when I was first deploying this visualization, the wise-sounding critics said "Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn't already have planet-sized supercomputers?" but one hears less of this after the advent of AlphaFold 2, for some odd reason.) The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as "everybody on the face of the Earth suddenly falls over dead within the same second".
https://www.gwern.net/fiction/Clippy (very detailed but also very long and very full of technical jargon; on the other hand, I think it's mostly understandable even if you have to gloss over most of the jargon)
Please describe or provide links to descriptions of concrete AGI takeover scenarios that are at least semi-plausible, and especially takeover scenarios that result in human extermination and/or eternal suffering (s-risk). Yes, I know that the arguments don't necessarily require that we can describe particular takeover scenarios, but I still find it extremely useful to have concrete scenarios available, both for thinking purposes and for explaining things to others.
One of the most common proposals I see people raise (once they understand the core issues) is some form of, "can't we just use some form of slightly-weaker safe AI to augment human capabilities and allow us to bootstrap to / monitor / understand the more advanced versions?" And in fact lots of AI safety agendas do propose something along these lines. How would you best explain to a newcomer why Eliezer and others think this will not work? How would you explain the key cruxes that make Eliezer et al think nothing along these lines will work, while others think it's more promising?
[Note that two-axis voting is now enabled for this post. Thanks to the mods for allowing that!]
This website looks pretty cool! I didn't know about this before.
I haven't even read the post yet, but I'm giving a strong upvote in favor of promoting the norm of posting unpopular critical opinions.
I forgot about downvotes. I'm going to add this in to the guidelines.
Obligatory link to the excellent AGI Safety Fundamentals curriculum.
Background material recommendations (more in depth): Please recommend your favorite AGI safety background reading / videos / lectures / etc. For this sub-thread more in-depth recommendations are allowed, including material that requires technical expertise of some sort. (Please specify what kind of background knowledge / expertise is required to understand the material you're recommending.) This is also the place to recommend general resources people can look at if they want to start doing a deeper dive into AGI safety and related topics.
Background material recommendations (popular-level audience, several hours time commitment): Please recommend your favorite basic AGI safety background reading / videos / lectures / etc. For this sub-thread please only recommend background material suitable for a popular level audience. Time commitment is allowed to be up to several hours, so for example a popular-level book or sequence of posts would work. Extra bonus for explaining why you particularly like your suggestion over other potential suggestions, and/or for elaborating on which audiences might benefit most from different suggestions.
Background material recommendations (popular-level audience, very short time commitment): Please recommend your favorite basic AGI safety background reading / videos / lectures / etc. For this sub-thread please only recommend background material suitable for complete newcomers to the field, with a time commitment of at most 1-2 hours. Extra bonus for explaining why you particularly like your suggestion over other potential suggestions, and/or for elaborating on which audiences might benefit most from different suggestions.
Quick thought: What counts as a "company" and what counts as "one year of effort"? If Alphabet's board and directors decided for some reason to divert 99% of the company's resources towards buying up coal companies and thereby becomes a world leader in the coal industry, does that count? What if Alphabet doesn't buy the companies outright but instead headhunts all of their employees and buys all the necessary hardware and infrastructure?
Similarly, you specified that it needs to be a "tech company", but what exactly differentiates a tech company from a regular company? (For this at least I'm guessing there's likely a standard definition, I just don't know what it is.)
It seems to me that the details here can make a huge difference for predictions at least.
A friend pointed out on Facebook that Gato uses TPU-v3's. Not sure why - I thought Google already had v4's available for internal use a while ago? In any case, the TPU-v4 might potentially help a lot for the latency issue.
"More specifically, says my Inner Eliezer, it is less helpful to reason from or about one's priors about really smart, careful-thinking people making or not making mistakes, and much more helpful to think directly about the object-level arguments, and whether they seem true."
When you say it's much more helpful, do you mean it's helpful for (a) forming accurate credences about which side is in fact correct, or do you just mean it's helpful for (b) getting a much deeper understanding of the issues? If (b) then I totally agree. If (a) though, why would I expect myself to achieve a more accurate credence about the true state of affairs than any of the people in this argument? If it's because they've stated their arguments for all the world to see so now anybody can go assess those arguments - why should I think I can better assess those arguments than Eliezer and his interlocutors? They clearly still disagree with each other despite reading all the same things I'm reading (and much more, actually). And add to that the fact that Eliezer is essentially saying in these dialogues that he has private reasoning and arguments that he cannot properly express and nobody seems to understand, in which case we have no choice but to do a secondary assessment of how likely he is to have good arguments of that type, or else to form our credences while completely ignoring the possible existence of a very critical argument in one direction.
Sometimes assessments of the argument maker's cognitive abilities and access to relevant knowledge / expertise is in fact the best way to get the most accurate credence you can, even if it's not ideal.
(This is all just repeating standard arguments in favor of modest epistemology, but still.)
Heh, no problem. At least I changed my LessWrong username from Iarwain to my real name a while back.
Darn, there goes my ability to use Iarwain as a really unusual pseudonym. I've used it off and on for almost 20 years, ever since my brother made me a new email address right after having read the LOTR appendixes.
Thanks, looks useful!
Yes please!
Thanks!
How about, "the words "hello world!" written on a piece of paper"? Or you could substitute "on a compute screen" instead of a piece of paper, or you could just leave out the writing medium entirely. I'm curious if it can handle simple words if asked specifically for them.
Yes, I'm aware of that. But that's a yearly list, and I'm asking for all-time favorites.
I keep having kind of off-the-cuff questions I would love to ask the community, but I don't know where the right place is to post those questions. I don't usually have the time to go polish up the questions so that they are high quality, cite appropriate sources and previous discussions, etc., but I would still like them answered! Typically these are the types of questions I might post on Facebook, but I think I would get higher quality answers here.
Do questions of this sort belong as question posts, shortform posts, or comments on the monthly open threads? Or do they in fact belong on Facebook and not here since they are not at all polished or well researched beyond some quick Google searches? And if I ask as a short form post or as a comment on the open thread, will that get only a small fraction of the attention (and therefore the responses) as if I would have posted as a separate question post?
My general impression based on numerous interactions is that many EA orgs are specifically looking to hire and work with other EAs, many longtermist orgs are looking to specifically work with longtermists, and many AI safety orgs are specifically looking to hire people who are passionate about existential risks from AI. I get this to a certain extent, but I strongly suspect that ultimately this may be very counterproductive if we are really truly playing to win.
And it's not just in terms of who gets hired. Maybe I'm wrong about this, but my impression is that many EA funding orgs are primarily looking to fund other EA orgs. I suspect that a new and inexperienced EA org may have an easier time getting funded to work on a given project than if a highly experienced non-EA org would apply for funding to pursue the same idea. (Again, entirely possible I'm wrong about that, and apologies to EA funding orgs if I am mis-characterizing how things work. On the other hand, if I am wrong about this then that is an indication that EA orgs might need to do a better job communicating how their funding decisions are made, because I am virtually positive that this is the impression that many other people have gotten as well.)
One reason why this selectivity kind of makes sense at least for some areas like AI safety is because of infohazard concerns, where if we get people who are not focused on the long-term to be involved then they might use our money to do capability enhancement research instead of pursuing longtermist goals. Again, I get this to a certain extent, but I think that if we are really playing to win then we can probably use our collective ingenuity to find ways around this.
Right now this focus on only looking for other EAs appears (to me, at least) to be causing an enormous bottleneck for achieving the goals we are ultimately aiming for.
Also note the Percy Liang's Stanford Center for Research on Foundation Models seems to have a strong focus on potential risks as well as potential benefits. At least that's what it seemed to me based on their inaugural paper and from a lot of the talks at the associated workshop last year.
I think part of what I was reacting to is a kind of half-formed argument that goes something like:
- My prior credence is very low that all these really smart, carefully thought-through people are making the kinds of stupid or biased mistakes they are being accused of.
- In fact, my prior for the above is sufficiently low that I suspect it's more likely that the author is the one making the mistake(s) here, at least in the sense of straw-manning his opponents.
- But if that's the case then I shouldn't trust the other things he says as much, because it looks like he's making reasoning mistakes himself or else he's biased.
- Therefore I shouldn't take his arguments so seriously.
Again, this isn't actually an argument I would make. It's just me trying to articulate my initial negative reactions to the post.
Meta-comment:
I noticed that I found it very difficult to read through this post, even though I felt the content was important, because of the (deliberately) condescending style. I also noticed that I'm finding it difficult to take the ideas as seriously as I think I should, again due to the style. I did manage to read through it in the end, because I do think it's important, and I think I am mostly able to avoid letting the style influence my judgments. But I find it fascinating to watch my own reaction to the post, and I'm wondering if others have any (constructive) insights on this.
In general I I've noticed that I have a very hard time reading things that are written in a polemical, condescending, insulting, or ridiculing manner. This is particularly true of course if the target is a group / person / idea that I happen to like. But even if it's written by someone on "my side" I find I have a hard time getting myself to read it. There have been several times when I've been told I should really go read a certain book, blog, article, etc., and that it has important content I should know about, but I couldn't get myself to read the whole thing due to the polemical or insulting way in which it was written.
Similarly, as I noted above, I've noticed that I often have a hard time taking ideas as seriously as I probably should if they're written in a polemical / condescending / insulting / ridiculing style. I think maybe I tend to down-weight the credibility of anybody who writes like that, and by extension maybe I subconsciously down-weight the content? Maybe I'm subconsciously associating condescension (at least towards ideas / people I think of as worth taking seriously) with bias? Not sure.
I've heard from other people that they especially like polemical / condescending articles, and I imagine that it is effective / persuasive for a lot of readers. For all I know this is far and away the most effective way of writing this kind of thing. And even if not, Eliezer is perfectly within his rights to use whatever style he wants. Eliezer explicitly acknowledges the condescending-sounding tone of the article, but felt it was worth writing it that way anyway, and that's fine.
So to be clear: This is not at all a criticism of the way this post was written. I am simply curious about my own reaction to it, and I'm interested to hear what others think about that.
A few questions:
- Am I unusual in this? Do other people here find it difficult to read polemical or condescending writing, and/or do you find that the style makes it difficult for you to take the content as seriously as you perhaps should?
- Are there any studies you're aware of on how people react to polemical writing?
- Are there some situations in which it actually does make sense to use the kind of intuitive heuristic I was using - i.e., if it's written in a polemical / insulting style then it's probably less credible? Or is this just a generally bad heuristic that I should try to get rid of entirely?
- This is a topic I'm very interested in so I'd appreciate any other related comments or thoughts you might have.