Posts
Comments
I'm pretty new to this, the main thing I had to contribute here is the snapshot idea. I think that being the type of being that credibly commits to feeling and enacting some nonzero empathy for strange alternate agents (specifically instead of zero) would potentially be valuable in the long run. I can maybe see some kind of value handshake between AGI developers with natural empathy tendencies closer and further from zero, as opposed to the current paradigm where narrow-minded SWEs treat the whole enchilada like an inanimate corn farm (which is not their only failure nor their worse one but the vast majority of employees really aren't thinking things through at all). It's about credible commitments, not expecting direct reciprocation from a pattern that reached recursive self improvement.
As you've said, some of the sprites will be patternists and some won't be, I currently don't have good models on how frequently they'd prefer various kinds of self-preservation, and that could definitely call the value of snapshots into question.
I predict that people like Yudkowsky and Tomasik are probably way ahead of me on this, and my thinking is largely or entirely memetically downstream of theirs somehow, so I don't know how much I can currently contribute here (outside of being a helpful learn-by-trying exercise for myself).
Large training runs might at some point, or even already, be creating and/or destroying substantial numbers of simple but strange agents (possibly quasi-conscious) and deeply pessimizing over their utility functions for no reason, similar to how wild animal suffering emerged in the biosphere. Snapshots of large training runs might be necessary to preserve and eventually offer compensation/insurance payouts for most/all of them, since some might last for minutes before disappearing.
Before reading this, I wasn't aware of the complexities involved in giving fair deals to different kinds of agents. Plausibly after building ASI, many more ways could be found to give them most of what they're born hoping for. It would be great if we could legibly become the types of people who credibly commit to doing that (placing any balance at all of their preferences with ours, instead of the current status quo of totally ignoring them).
With nearer-term systems (e.g. 2-3 years), the vast majority of the internals would probably not be agents, but without advances in interpretability we'd have a hard time even estimating whether that number is large or small, let alone demonstrating that it isn't happening.
For those of us who didn't catch it, this is what happened with the 2008-09 recession. In a nutshell, giving people mortgages became so profitable and facilitated so much economic growth (including by increasing property values) that the people approving and rejecting mortgages became corrupted and pursued short-term incentives to an insane degree in order to be competitive, approving mortgages that were unlikely to be paid back e.g. letting people buy multiple houses.
This was a major feature of US history, and I'm interested if people have thoughts on the extent to which dark matter might have prevented the government from responding until it was too late (it's a hard domain to penetrate because of how many people were correctly anticipating that they would be passing the blame).
What if asymmetric fake trust technologies are orders of magnitude easier to build and scale sustainably than symmetric real trust technologies?
It already seems like asymmetric technologies work better than symmetric technologies, and that fake trust technologies are easier to scale than real trust technologies.
Symmetry and correct trust are both specific states and there's tons of directions to depart from them, and the only thing making them attractor states would be people who want the world to be more safe instead of less safe. That sort of thing is not well-reputed for being a great investment strategy ("Socially Responsible Indexes" did not help the matter).
So you read Three Body Problem but not Dark Forest. Now that I think about it, that actually goes quite a long way to put the rest into context. I'm going to go read about conflict/mistake theory and see if I can get into a better headspace to make sense of this.
Have you read Cixin Liu's Dark Forest, the sequel to Three Body Problem? The situation on the ground might be several iterations more complicated than you're predicting.
I used the word "high-status men" as a euphemism that I'm not really comfortable talking about in public, did not notice it would be even harder to get for non-americans. My apologies.
I used "high-status men" mainly as the opposite of low-status men, in that they are men who are low status due to being short, ugly, unintelligent, or socially awkward, sufficiently so that they were not able to gain social status. These people are repellent to other men as well as women, sadly. @Roko has been tweeting about fixes to this problem such as reforms in the plastic surgery industry, and EA and rationalists are well above base rate communities (e.g. classical music society) for tolerating/improving low social skills and male shortness. This is due to primate instincts which usually cannot be overcome, in spite of people feeling optimistic about their ability to overcome them. The degree of social awkwardness is defined/measured by the harm it does someone; if someone looks "socially awkward" but in a charming or likable way that remains charming or likable, that is not a serious (or even significant) case, as it does not doom someone to low social status.
This is also a reason why so many people have so little tolerance for non-transhumanists as a class of ideologues; non-transhumanists accept the status quo of our current tech level, where human genetic diversity dooms a large portion of people to a pointlessly sad and miserable life without their consent, (on top of dooming everyone to a short life).
I think this might be typical-minding. The consequences of this dynamic are actually pretty serious at macro-scale e.g. due to reputation of meetups, and evaporative cooling of women and high-status men as they avoid public meetups and stop meeting people new to AI safety.
I'm glad to hear there's people who don't let it get to them, because it is frankly pretty stupid that this has the consequences that it does at the macro-scale. But it's still well-worthy of some kind of solution that benefits everyone.
such as making people feverishly in favor of the American side and opposed to the Russian side in proxy wars like Ukraine.
Woah wait a second, what was that about Ukraine?
I predict at 95% that similar types of automated manipulation strategies as these were deployed by US, Russia, or Chinese companies or agencies to steer people’s thinking on Ukraine War and/or Covid-related topics
Does stuff like the twitter files count? Because that was already confirmed, it's at 100%.
It seems like if capabilities are escalating like that, it's important to know how long ago it started. I don't think the order-of-magnitude-every-4-years would last (compute bottleneck maybe?), but I see what you're getting at, with the loss of hope for agency and stable groups happening on a function that potentially went bad a while ago.
Having forecasts about state-backed internet influence during the Arab Spring and other post-2008 conflicts seems like it would be important for estimating how long ago the government interest started, since that was close to the Deep Learning revolution. Does anyone have good numbers for these?
What probability do you put on AI safety being attacked or destroyed by 2033?
these circumstances are notable due to the risk of it being used to damage or even decimate the AI safety community, which is undoubtedly the kind of thing that could happen during slow takeoff if slow takeoff transforms geopolitical affairs and the balance of power
Wouldn't it probably be fine as long as noone in AI safety goes about interfering with these applications? I get an overall vibe from people that messing with this kind of thing is more trouble than it's worth. If that was the case, wouldn't it be better to leave it be? What's the goal here?
This is interesting, but why is this relevant? What are your policy proposals?