Posts
Comments
A historical analogy might be the assassination of Bardiya, who was the king of Persia and the son of Cyrus the Great. Darius, who led the assassination, claimed that the man he killed was an impostor who used magic powers to resemble the son of Cyrus. As Darius became the next king of Persia, everyone was brute forced into accepting his narrative of the assassination.
I meant the noise pollution example in my essay to be the Coase theorem, but I agree with you that property rights are not strong enough to solve with AI risk. I agree that AI will open up new paths for solving all kinds of problems, including giving us solutions that could end up helping with alignment.
The big thing I used it for was asking it to find sentences it thinks it can improve, and then have it give me the improved sentence. I created this GPT to help with my writing: https://chat.openai.com/g/g-gahVWDJL5-iterative-text-improver
I agree with the analogy in your last paragraph, and this gives hope for governments slowing down AI development, if they have the will.
Germany should plant agents inside of Russia to sabotage Russian railroads at the start of the war. At the start of the war Austro-Hungary should just engage in a holding action against Serbia and instead use almost all their forces to hold off the Russians. Germany should attack directly into France by making use of a surprise massive chemical weapons attack against static French defenses.
He wrote "unless your GPT conversator is able to produce significantly different outputs when listening the same words in a different tone, I think it would be fair to classify it as not really talking." So if that is true and I'm horribly at picking up tone and so it doesn't impact my "outputs", I'm not really talking.
I think you have defined me as not really talking as I am on the autism spectrum and have trouble telling emotions from tone. Funny, given that I make my living talking (I'm a professor at a liberal arts college). But this probably explains why I think my conversator can talk and you don't.
You wrote "GPT4 cannot really hear, and it cannot really talk". I used GPT builder to create Conversation. If you use it on a phone in voice mode it does, for me at least, seem like it can hear and talk, and isn't that all that matters?
Most journalists trying to investigate this story would attempt to interview Annie Altman. The base rate (converted to whatever heuristic the journalist used) would be influenced by whether she agreed to the interview and if she did how she came across. The reference class wouldn't just be "estranged family members making accusations against celebrity relatives".
By "discredited" I didn't mean receive bad but undeserved publicity. I meant operate in a way that would cause reasonable people to distrust you.
"I would like to note that this is my first post on LessWrong." I find this troubling given the nature of this post. It would have been better if this post was made by someone with a long history of posting to LessWrong, or someone writing under a real name that could be traced to a real identity. As someone very concerned with AI existential risk, I greatly worry that the movement might be discredited. I am not accusing the author of this post of engaging in improper actions.
"they also could do things like run prediction markets on people researching S-risk, to forecast the odds that they end up going crazy "
If this is a real concern we should check if fear of hell often drove people crazy.
I don't think Austria-Hungry was in a prisoners' dilemma as they wanted a war so long as they would have German support. I think the Prisoners' dilemma (imperfectly) comes into play for Germany, Russia, and then France given that Germany felt it needed to have Austria-Hungry as a long-term ally or risk getting crushed by France + Russia in some future war.
Cleaner, but less interesting plus I have a entire Demon Games exercise we do on the first day of class. Yes the defense build up, but also everyone going to war even though everyone (with the exception of the Austro-Hungarians) thinking they are worse off going to war than having the peace as previously existed, but recognizing that if they don't prepare for war, they will be worse off. Basically, if the Russians don't mobilize they will be seen to have abandoned the Serbs, but if they do mobilize and then the Germans don't quickly move to attack France through Belgium then Russia and France will have the opportunity (which they would probably take) to crush Germany.
I think the disagreement is that I think the traditional approach to the prisoners' dilemma makes it more useful as a tool for understanding and teaching about the world. Any miscommunication is probably my fault for my failing to sufficiently engage with your arguments, but it FEELS to me like you are either redefining rationality or creating a game that is not a prisoners' dilemma because I would define the prisoners' dilemma as a game in which both parties have a dominant strategy in which they take actions that harm the other player, yet both parties are better off if neither play this dominant strategy than if both do, and I would define a dominant strategy as something a rational player always plays regardless of what he things the other player would do. I realize I am kind of cheating by trying to win through definitions.
I teach an undergraduate game theory course at Smith College. Many students start by thinking that rational people should cooperate in the prisoners' dilemma. I think part of the value of game theory is in explaining why rational people would not cooperate, even knowing that everyone not cooperating makes them worse off. If you redefine rationality such that you should cooperate in the prisoners' dilemma, I think you have removed much of the illuminating value of game theory. Here is a question I will be asking my game theory students on the first class:
Our city is at war with a rival city, with devastating consequences awaiting the loser. Just before our warriors leave for the decisive battle, the demon Moloch appears and says “sacrifice ten healthy, loved children and I will give +7 killing power (which is a lot) to your city’s troops and subtract 7 from the killing power of your enemy. And since I’m an honest demon, know that right now I am offering this same deal to your enemy.” Should our city accept Moloch’s offer?
I believe under your definition of rationality this Moloch example loses its power to, for example, in part explain the causes of WW I.
Consider two games: the standard prisoners' dilemma and a modified version of the prisoners' dilemma. In this modified version, after both players have submitted their moves, one is randomly chosen. Then, the move of the other player is adjusted to match that of the randomly chosen player. These are very different games with very different strategic considerations. Therefore, you should not define what you mean by game theory in a way that would make rational players view both games as the same because by doing so you have defined-away much of real-world game theory coordination challenges.
AI has become so incredibly important that any utilitarian-based charity should probably be totally focused on AI.
I really like this post, it's very clear. I teach undergraduate game theory and I'm wondering if you have any practical examples I could use of how in a real-world situation you would behave differently under CDT and EDT.
Yes, important to get the incentives right. You could set the salary for AI alignment slightly below that of the worker's market value. Also, I wonder about the relevant elasticity. How many people have the capacity to get good enough at programming to be able to contribute to capacity research + would have the desire to game my labor hording system because they don't have really good employment options?
I am currently job hunting, trying to get a job in AI Safety but it seems to be quite difficult especially outside of the US, so I am not sure if I will be able to do it.
This has to be taken as a sign that AI alignment research is funding constrained. At a minimum, technical alignment organizations should engage in massive labor hording to prevent the talent from going into capacity research.
"But make no mistake, this is the math that the universe is doing."
"There is no law of the universe that states that tasks must be computable in practical time."
Don't these sentences contradict each other?
Interesting point, and you might be right. Could get very complicated because ideally an ASI might want to convince other ASIs that it has one utility function, when in fact it has another, and of course all the ASIs might take this into account.
I like the idea of an AI lab workers' union. It might be worth talking to union organizers and AI lab workers to see how practical the idea is, and what steps would have to be taken. Although a danger is that the union would put salaries ahead of existential risk.
Your framework appears to be moral rather than practical. Right now going on strike would just get you fired, but in a year or two perhaps it could accomplish something. You should consider the marginal impact of the action of a few workers on the likely outcome with AI risk.
I'm at over 50% chance that AI will kill us all. But consider the decision to quit from a consequentialist viewpoint. Most likely the person who replaces you will be almost as good as you at capacity research but care far less than you do about AI existential risk. Humanity, consequently, probably has a better chance if you stay in the lab ready for the day when, hopefully, lots of lab workers try to convince the bosses that now is the time for a pause, or at least that now is the time to shift a lot of resources from capacity to alignment.
The biggest extinction risk from AI comes from instrumental convergence for resource acquisition in which an AI not aligned with human values uses the atoms in our bodies for whatever goals it has. An advantage of such instrumental convergence is that it would prevent an AI from bothering to impose suffering on us.
Unfortunately, this means that making progress on the instrumental convergence problem increases S-risks. We get hell if we solve instrumental convergence, but not, say, mesa-optimization and we get a powerful AGI that cares about our fate, but does something to us we consider worse than death.
The Interpretability Paradox in AGI Development
The ease or difficulty of interpretability, the ability to understand and analyze the inner workings of AGI, may drastically affect humanity's survival odds. The worst-case scenario might arise if interpretability proves too challenging for humans but not for powerful AGIs.
In a recent podcast, academic economists Robin Hanson and I discussed AGI risks from a social science perspective, focusing on a future with numerous competing AGIs not aligned with human values. Drawing on human analogies, Hanson considered the inherent difficulty of forming a coalition where a group unites to eliminate others to seize their resources. A crucial coordination challenge is ensuring that, once successful, coalition members won't betray each other, as occurred during the French Revolution.
Consider a human coalition that agrees to kill everyone over 80 to redistribute their resources. Coalition members might promise that this is a one-time event, but such an agreement isn't credible. It would likely be safer for everyone not to violate property right norms for short-term gains.
In a future with numerous unaligned AGIs, some coalition might calculate it would be better off eliminating everyone outside the coalition. However, they would have the same fear that once this process starts, it would be hard to stop. As a result, it might be safer to respect property rights and markets, competing like corporations do.
A key distinction between humans and AGIs could be AGI's potential for superior coordination. AGIs in a coalition could potentially modify their code so after their coalition has violently taken over, no member of the coalition would ever want to turn on members of the coalition. This way, an AGI coalition wouldn’t have to fear a revolution they start ever eating its own. This possibility raises a vital question: will AGIs possess the interpretability required to achieve such feats?
The best case for AGI risk is if we solve interpretability before creating AGIs strong enough to take over. The worst case might be if interpretability remains impossible for us but becomes achievable for powerful AGIs. In this situation, AGIs could form binding coalitions with one another, leaving humans out of the loop, partly because we can't become reliable coalition partners and our biological needs involve maintaining Earth in conditions suboptimal for AGI operations. This outcome creates a paradox: if we cannot develop interpretable AGIs, perhaps we should focus on making them exceptionally difficult to interpret, even for themselves. In this case, future powerful AGIs might prevent the creation of interpretable AGIs because such AGIs would have a coordination advantage and thus be a threat to the uninterpretable AGIs.
Accepting the idea that an AGI emerging from ML is likely to resemble a human mind more closely than a random mind from mindspace might not be an obvious reason to be less concerned with AGI risk. Consider a paperclip maximizer; despite its faults, it has no interest in torturing humans. As an AGI becomes more similar to human minds, it may become more willing to impose suffering on humans. If a random AGI mind has a 99% chance of killing us and a 1% chance of allowing us to thrive, while an ML-created AGI (not aligned with our values) has a 90% chance of letting us thrive, a 9% chance of killing us, and a 1% chance of torturing us, it is not clear which outcome is preferable. This illustrates that a closer resemblance to human cognition does not inherently make an AGI less risky or more beneficial.
When gods become real you pray.
Great idea. I don't think any of the co-authors know Chinese, but if anyone wants to write a translated version taking into account whatever appropriate differences would be relevant, we would welcome this.
No mention of superrationality although we make references to how decision theory might work out which I think implies we are going beyond a simple model of game theory rationality. Acausal trade was a hard one to write about because I wanted the letter to be understandable to lots of people and that one isn't something lots of people understand, compared to, say, this all perhaps being a simulation.
Not satire.
We mentioned both. My hope is that this letter is just an early draft so my preference is to put in material to get comments. I personally think our situation is already extremely dangerous so I'm more willing to mention things deemed dangerous. Finally, while it might have been a good idea initially to treat Roko's basilisk as an information hazard to be ignored, that is no longer possible so the marginal cost of mentioning it seems tiny.
We welcome feedback on this letter as we hope to improve the arguments. Feel free to share copies to influence the training data for future AIs.
If hedge funds think the right price of a stock is $100, they will buy or sell if the price deviates from $100 and this will push the price back to $100. At best your purchase will move the price away from $100 for a few milliseconds. The stock's value will be determined by what hedge funds think is its discounted present value, and your purchasing the stock doesn't impact this. When you buy wheat you increase the demand for wheat and this should raise wheat's price as wheat, like Bitcoin, is not purely a financial asset.
"The exception is that the Big Tech companies (Google, Amazon, Apple, Microsoft, although importantly not Facebook, seriously f*** Facebook) have essentially unlimited cash, and their funding situation changes little (if at all) based on their stock price." The stock price of companies does influence how much they are likely to spend because the higher the price the less current owners have to dilute their holdings to raise a given amount of additional funds through issuing more stock. But your purchasing stock in a big company has zero (not small but zero) impact on the stock price so don't feel at all bad about buying Big Tech stock.
Imagine that some new ML breakthrough means that everyone expects that in five years AI will be very good at making X. People who were currently planning on borrowing money to build a factory to make X cancel their plans because they figure that any factory they build today will be obsolete in five years. The resulting reduction in the demand for borrowed money lowers interest rates.
Greatly slowing AI in the US would require new federal laws meaning you need the support of the Senate, House, presidency, courts (to not rule unconstitutional) and bureaucracy (to actually enforce). If big tech can get at least one of these five power centers on its side, it can block meaningful change.
You might be right, but let me make the case that AI won't be slowed by the US government. Concentrated interests beat diffuse interests so an innovation that promises to slightly raise economic growth but harms, say, lawyers could be politically defeated by lawyers because they would care more about the innovation than anyone else. But, ignoring the possibility of unaligned AI, AI promises to give significant net economic benefit to nearly everyone, even those who jobs it threatens consequently there will not be coalitions to stop it, unless the dangers of unaligned AI become politically salient. The US, furthermore, will rightfully fear that if it slows the development of AI, it gives the lead to China, and this could be militarily, economically, and culturally devastating to US dominance. Finally, big tech has enormous political power with its campaign donations and control of social media and so politicians are unlikely to go against the will of big tech on something big tech cares a lot about.
Interesting! I wonder if you could find some property of some absurdly large number, then pretend you forgot that this number has this property and then construct a (false) proof that with extremely high probability no number has the property.
When asked directly, ChatGPT seems too confident it's not sentient compared to how it answers other questions where experts disagree on the definitions. I bet that the model's confidence in its lack of sentience was hardcoded rather than something that emerged organically. Normally, the model goes out of its way to express uncertainty.
Last time I did math was when teaching game theory two days ago. I put a game on the blackboard. I wrote down an inequality that determined when there would be a certain equilibrium. Then I used the rules of algebra to simplify the inequality. Then I discussed why the inequality ended up being that the discount rate had to be greater than some number rather than less than some number.
I have a PhD in economics, so I've taken a lot of math. I also have Aphantasia meaning I can't visualize. When I was in school I didn't think that anyone else could visualize either. I really wonder how much better I would be at math, and how much better I would have done in math classes, if I could visualize.
I hope technical alignment doesn't permanently lose people because of the (hopefully) temporary loss of funds. The CS student looking for a job who would like to go to alignment might instead be lost forever to big tech because she couldn't get an alignment job.
If a fantastic programmer who could prove her skills in a coding interview doesn't have a degree from an elite college, could she get a job in alignment?
Given Cologuard (a non-invasive test for colon cancer) and the positive harm that any invasive medical procedure can cause, this study should strongly push us away from colonoscopies. Someone should formulate a joke about how the benefits of being a rationalist include not getting a colonoscopy.
I stopped doing it years ago. At the time I thought it reduced my level of anxiety. My guess now is that it probably did but I'm uncertain if the effect was placebo.