Posts
Comments
The Demon King does not solely attack the Frozen Fortress to profit on prediction markets. The story tells us that the demons engage in regular large-scale attacks, large enough to serve as demon population control. There is no indication that these attacks decreased in size when they were accompanied with market manipulation (and if they did, that would be a win in and of itself).
So the prediction market's counterfactual is not that the Demon King's forces don't attack, but that they attack at an indeterminate time with the same approximate frequency and strength. By letting the Demon King buy and profit from "demon attack on day X" shares, the Circular Citadel learns with decently high probability when these attacks take place and can allocate its resources more effectively. Hire mercenaries on days the probability is above 90%, focus on training and recruitment on days of low-but-typical probability, etc.
This ability to allocate resources more efficiently has value, which is why the Heroine organized the prediction market in the first place. The only thing that doesn't go according to the Heroine's liking is that the Circular Citadel buys that information from the Demon King rather than from 'the invisible hand of the market'.
more generally the Demon King would only do this if the information revealed weren't worth the market cost
The Demon King would sell the information as soon as she thinks it is in her best interests, which is different from it being bad for the Circular Citadel. Especially considering the Circular Citadel doesn't even have to pay the full cost of the information - everyone who bets is also paying.
It is very possible that the Demon King and the Circular Citadel both profit from the prediction market existing, while the demon ground forces and naive prediction market bettors lose.
That's probably the only "military secret" that really matters.
The soldiers guarding the outer wall and the Citadel treasurer that pays their overtime wages would beg to differ.
Yes, that is the price she got for giving the information.
I think that AI people that are very concerned about AI risk tend to view loss of control risk as very high, while eternal authoritarianism risks are much lower.
I'm not sure how many people see the risk of eternal authoritarianism as much lower and how many people see it as being suppressed by the higher probability of loss of control[1]. Or in Bayesian terms:
P(eternal authoritarianism) = P(eternal authoritarianism | control is maintained) P(control is maintained)
Both sides may agree that P(eternal authoritarianism | control is maintained) is high, only disagreeing on P(control is maintained).
- ^
Here, 'control' is short for all forms of ensuring AI alignment to humans, whether all or some or one.
As far as I understand, "a photon in a medium" is a quasiparticle. Actual photons always travel at the speed of light, and the "photon" that travels through glass at a lower speed is the sum of an incredibly complicated process that cancels out perfectly into something that can be described as one or several particles if you squint a little because the energy of the electromagnetic field excitation can't be absorbed by the transparent material and because of preservation of momentum.
The model of the photon "passing by atoms and plucking them" is a lie to children, an attempt to describe a quantum phenomenon in classically comprehensible terms. As such, what "the momentum of the quasiparticle" is depends on what you consider to be part of the quasiparticle, or which parts of the quasiparticle(s) you measure when doing an experiment.
Specifically, for the mirror-in-a-liquid, when light hits the liquid it is refracted. That refraction makes the angle of the path of the light more steep, which means the mirror has to reflect momentum that is entering at a more steep angle, and so the momentum the mirror measured is multiplied by the refractive index. At the quasiparticle level, when hitting the liquid interface, the light interacts with the quasiparticle phonon [sic] field of the liquid, exchanging momentum to redirect the quasiparticle light, and the mirror has to reflect both the quasiparticle light and the phonon field, resulting in the phonons being included with "the momentum of the light".
However, for the light-through-a-glass-fiber, you are measuring the momentum of the phonons as part of the not-light, because the phonons are part of the medium, so part of the glass fiber getting nudged by the light beam.
I'm not sure how this works out in rigorous calculation, but this is my intuition pump for a [1]-ish answer.
Though compare and contrast Dune's test of the gom jabbar:
You've heard of animals chewing off a leg to escape a trap? There's an animal kind of trick. A human would remain in the trap, endure the pain, feigning death that he might kill the trapper and remove a threat to his kind.
Even if you are being eaten, it may be right to endure it so that you have an opportunity to do more damage later.
You're suggesting angry comments as an alternative for mass retributive downvoting. That easily implies mass retributive angry comments.
As for policing against systemic bias in policing, that's a difficult problem that society struggles with in many different areas because people can be good at excusing their biases. What if one of the generals genuinely makes a comment people disagree with? How can you determine to what extent people's choice to downvote was due to an unauthorized motivation?
It seems hard to police without acting draconically.
Just check their profile for posts that do deserve it that you were previously unaware of. You can even throw a few upvotes at their well-written comments. It's not brigading, it's just a little systemic bias in your duties as a user with upvote-downvote authority.
Are you trying to prime people to harass the generals?
Besides, it's not mass downvoting, it's just that the increased attention to their accounts revealed a bunch of poorly written comments that people genuinely disagree with and happen to independently decide are worthy of a downvote :)
"why not just" is a standard phrase for saying what you're proposing would be simple or come naturally if you try. Combined with the rest of the comment talking about straightforwardness and how little word count, and it does give off a somewhat combatitive vibe.
I agree with your suggestion and it is good to hear that you don't intend it imply that it is simple, so maybe it would be worth editing the original comment to prevent miscommunication for people who haven't read it yet. For the time being I've strong-agreed with your comment to save it from a negativity snowball effect.
No. I would estimate that there are fewer rich people willing to sacrifice their health for more income than there are poor people willing to do the same. Rich people typically take more holidays, report higher job satisfaction, suffer fewer stress-related ailments, and spend more time and money on luxuries rather than reinvesting into their careers (including paying basic cost of living to be employable).
And not for lack of options. CEOs can get involved with their companies and provide useful labor by putting their nose to the grindstone, or kowtow to investors for growth opportunities. Investors can put a lot of labor into finding financial advisors and engaging in corporate intrigue to get an advantage on the market. Celebrities can work on their performance and their image to become more popular and get bigger signing deals.
Perhaps to clarify, "the grind" isn't absolute economic value or hours worked, it's working so hard that it cannibalizes other things you value.
There are rich people pushing themselves work 60+ hour days struggling to keep a smile on their face while people insult and demean them. And there are poor people who live as happy ascetics, enjoying the company of their fellows and eating simple meals, choosing to work few hours even if it means forgoing many things the middle class would call necessities.
There are more rich people that choose to give up the grind than poor people. It's tougher to accept a specific form of suffering if you see that 90% of your peers are able to solve the suffering with work than if 1% of your peers can. Right now, accepting your own mortality is normal even for rich people, but as soon as the 20% richest can live forever, suddenly everyone who isn't immortal will feel poor for lacking it. Maybe some will still embrace death rather than endure professional abuse 60 hours per week, but that is suddenly a much harder decision.
My first order guess for the mental definition of poverty would be if >X% of the population in your polity is able to afford a solution to various intense forms of suffering, but you can't. (Where X seems to be around 50-80%)
A rich 5th century BCE Athenian was a man for who any illness was very likely fatal, who loses half his children between ages ½ and 18, whose teeth are slowly ground down to the roots by milldust, who lived one bad harvest away from famine, and more. But now we characterize poor people by their desperate struggle to escape the things the Athenian rich person accepted as normal and inevitable.
I can see two ways UBI may solve poverty:
- Given UBI makes up a sufficiently high fraction of all income, the poorest person does not feel like their wealth differs enough from that of everyone else to conceptualize their suffering as poverty.
- Given sufficiently advanced technology, the UBI is enough to solve all major forms of suffering, from death to emotional abuse trauma, and even those with less stuff don't feel like they're missing out.
Option (1) may be possible already, especially in more laid-back cultures, but if not then UBI won't solve poverty yet.
All of that said, why is this the standard you choose to measure it by? Even if UBI doesn't solve poverty it can relieve suffering. And while other anxieties might take the place of the old on the hedonic treadmill, it does feel like life is better when sources of suffering are taken away. If you offered me $5000 but if I get depressed a child gets eaten alive by wolves before my eyes, I would say no, so apparently I do prefer my worst days over a hunter-gatherer's worst days.
Quality of life is what matters. UBI can take out financial anxiety and poverty traps, and likely improve exercise and physical health and nutrition and social satisfaction (assuming the economy doesn't collapse). Those are all things anyone could recognize as valuable, even if they have more urgent matters themselves.
You seem to approach the possible existence of a copy like a premise, with as question whether that copy is you. However, what if we reverse that? Given we define 'a copy of you' as another one of you, how certain is it that a copy of you could be made given our physics? What feats of technology are necessary to make that copy?
Also, what would we need to do to verify that a claimed copy is an actual copy? If I run ChatGPT-8 and ask it to behave like you would behave based on your brain scan and it manages to get 100% fidelity in all tests you can think of, is it a copy of you? Is a copy of you inside of it? If not, in what ways does the computation that determines an alleged copy's behavior have to match your native neuronal computation for it to be or contain a copy? Can a binary computer get sufficient fidelity?
I'm fine with uploading to a copy of myself. I'm not as optimistic about a company with glowing reviews offering to upload/make a copy of me.
In the post you mention Epistemic Daddies, mostly describing them as sources that are deferred to for object-level information.
I'd say there is also a group of people who seek Epistemic Mommies. People looking for emotional assurance that they're on the right path and their contribution to the field is meaningful; for assurance that making mistakes in reasoning is okay; for someone to do the annoying chores of epistemic hygiene so they can make the big conclusions; for a push to celebrate their successes and show them off to others; etc.
Ultimately both are ultimately about deferring to others (Epistemic Parents?) for information, but Epistemic Mommies are deferred to for procedural information. Givewell makes for a tempting Epistemic Daddy, but Lesswrong sequences make for a tempting Epistemic Mommy.
Your advice is pretty applicable to both, but I feel like the generalization makes it easier to catch more examples of unhealthy epistemic deference.
I kind of... hard disagree?
Effective Samaritans can't be a perfect utility inverse of Effective Altruists while keeping the labels of 'human', 'rational', or 'sane'. Socialism isn't the logical inverse of Libertarianism; both are different viewpoints on how to achieve the common goal of societal prosperity.
Effective Samaritans won't sabotage an EA social experiment any more than Effective Altruists will sabotage an Effective Samaritan social experiment. If I received a letter from Givewell thanking me for my donation that was spent on sabotaging a socialist commune, I would be very confused - that's what the CIA is for. I frankly don't expect either the next charter city or the next socialist commune to produce a flourishing society, but I do expect both to give valuable information that would allow both movements and society in general to improve their world models.
Also, our priors are not entirely trapped. It can seem that way because true political conversion rarely happens in public, and often not even consciously, but people join or leave movements regularly when their internal threshold is passed. Effective Altruist/Samaritan forums will always defend Effective Altruism/samaritanism as long as there is one EA/ES on earth, but if evidence (or social condemnation) builds up against it people whose thresholds are reached will just leave. Of course the movement as a whole can also update, but not in the same way as individual members.
People do tend to be resistant to evidence that goes against their (political) beliefs, but that resistance gets easier to surmount the fewer people there are that are less fanatical about the movement than you. Also, active rationalist practices like doubt can also help get your priors unstuck.
So in a world with EAs and ESs living side by side, there would constantly be people switching from one movement to the other or vice versa. And as either ES or EA projects get eaten by a grue, one of these rates will be greater than the other until one or both have too few supporters to do much of anything. This may have already happened (at least for the precise "Effective Samaritan" ideology of rationalism/bayesianism + socialism + individual effective giving).
So I don't think we need to resort to frequentism. Sure we can use frequentism to share the same scientific journal, but in terms of cooperation we can just all run our own experiments, share the data, and regularly doubt ourselves.
On third order, people who openly worry about X-Risk may get influenced by their environment, becoming less worried as a result of staying with a company whose culture denies X-Risk, which could eventually even cause them to contribute negatively to AI Safety. Preventing them from getting hired prevents this.
That sounds like something a cross between learned helplessness and madman theory.
The madman theory angle is "If I don't respond well to threats of negative outcomes, people (including myself) have no reason to threaten me". The learned helplessness angle is "I've never been able to get good sets of tasks and threats, and trying to figure something out usually leads to more punishment, so why put in any effort?"
Combine the two and you get "Tasks with risks of negative outcomes? Ugh, no."
With learned helplessness, the standard mechanism for (re)learning agency is being guided through a productive sequence by someone who can ensure the negative outcomes don't happen, getting more and more control over the sequence each time until you can do it on your own, then adapting it to more and more environments.
Avoiding tasks with possible negative outcomes isn't really feasible, so getting hands-on help with handling threat of negative consequences seems useful. Probably from a mental coach or psychologist.
The app doesn't help people who struggle with setting reasonable tasks with reasonable rewards and punishments. Akrasia is an umbrella term for "something somewhere in the chain to actually getting to do things is stopping the process", so it makes sense that one person's "solution" to akrasia isn't going to work for a lot of people.
I think it's healthy to see these kinds of posts as procedural inspiration. As a reader it's not about finding something that works for you, it's about analysing the technique someone used to iterate on their first hint of a good idea until it became something that thoroughly helped them.
I'd say "fuck all the people who are harming nature" is black-red/rakdos's view of white-green/selesnya. The "fuck X" attitude implies a certain passion that pure black would call wasted motion. Black is about power. It's not adversarial per se, just mercenary/agentic. Meanwhile the judginess towards others is an admixture of white. Green is about appreciating what is, not endorsing or holding on to it.
Black's view of green is "careless idiots, easy to take advantage of if you catch them by surprise". When black meets green, black notices how the commune's rules would allow someone to scam them of all their cash and how the charms they're wearing cost 10 times less to produce than what they paid for it.
Black-red/Rakdos' view of green is "tree huggers, weirdly in love with nature rather than everything else you can care about". When rakdos meets green they're inspired to throw a party in the woods, concluding it's kinda lame without a lightshow or a proper toilet, and leaving tons of garbage when they return home.
Black's view of white-green/selesnya is "people who don't seem to grasp the tragedy of the commons who can get obnoxiously intrusive about it. Sure nature can be nice but it's not worth wasting that much political capital on." When black meets selesnya, it tries to find an angle by which selesnya can give them more power. Maybe a ecological development grant that has shoddy selection criteria or a lopsided political deal.
Meanwhile black-green/golgari is "It is natural for people to be selfish. Everyone chooses themselves eventually, that's an evolutionary given. I will make selfish choices and appreciate the world, as any sane person would". It views selesnya as a grift, green as passive, black as self-absorbed, and rakdos as irrational.
I would say ecofascism is white-green-black/Abzan. The hard agency of black, the sense of communal approach of white, and the appreciation of nature of green, but lacking the academic rigor of blue or the wild passion of red.
Fighting with tigers is red-green, or Gruul by MTG terminology. The passionate, anarchic struggle of nature red in tooth and claw. Using natural systems to stay alive even as it destroys is black-green, or Golgari. Rot, swarms, reckless consumption that overwhelms.
Pure green is a group of prehistoric humans sitting around a campfire sharing ghost stories and gazing at the stars. It's a cave filled with handprints of hundreds of generations that came before. It's cats louging in a sunbeam or birds preening their feathers. It's rabbits huddling up in their dens until the weather is better, it's capybaras and monkeys in hot springs, and bears lazily going to hibernate. These have intelligible justifications, sure, but what do these animals experience while engaging in these activities?
Most vertebrates seem to have a sense of green, of relaxation and watching the world flow by. Physiologically, when humans and other animals relax, the sympathetic nervious system is suppressed and the parasympathetic system stays/becomes active. This causes the muscles to relax and causes the blood stream to prioritize digestion. For humans at least, stress and the pressure to find solutions right now decrease and the mind wanders. Attention loses its focus but remains high-bandwidth. This green state is where people most often come up with 'creative' solutions that draw on a holistic understanding of the situation.
Green is the notion that you don't have to strive towards anything, and the moment an animal does need to strive for something they mix in red, blue, black, or white, depending on what the situation calls for and the animal's color evolved toolset.
The colors exist because no color on its own is viable. Green can't keep you alive, and that's okay, it isn't meant to.
That doesn't seem like a good idea. You're ignoring long-term harms and benefits of the activity - otherwise cycling would be net positive - and you're ignoring activity duration. People don't commute to work by climbing Mount Everest or going skydiving.
I don't think it's precisely true. The serene antagonism that comes from having examined something and recognizing that it is worth taking your effort to destroy is different from the hot rage of offense. But of the two, I expect antagonism to be more effective in the long term.
- Rage is accompanied with a surge of adrenalin, sympathetic nervous activation, and usually parasympathetic nervous suppression, that is not sustainable in the long term. Antagonism is compatible with physiological rest and changes in the environment.
- Consequently, antagonism has access to system 2 and long term planning, while rage tends to have a short term view with limited information processing capabilities.
- Even when your antagonism calls for rapid physical action and rage, having a better understanding of the situation prevents you from being held back by doubt when you encounter (emotional) evidence that doesn't fit your current tack. The release of adrenalin and start of rage can then reliably be triggered by the feeling that you have unhindered access to the object of hatred.
- It's also possible when coming from calm antagonism to choose between rage and the state of both high parasympathetic and high sympathetic activation, where you're active but still have high sensory processing bandwidth (see also runner's high, sexual activity, or being 'in the zone' with sports or high-apm games), which for anger might be called pugnacity or bloodlust or simply an eagerness to fight.
Rage is good for punching the baddies in front of you in the face if you can take them in a straight fight. Pugnacity is good for systematically outmaneuvering their defenses and finding the path to victory in combat. Antagonism is good for making their death a week from now look like an accident, or to arrange a situation where rage and pugnacity can do their jobs unhindered.
but people recently have been arguing to me that the coming and going of emotions is a much more random process influenced by chemicals and immediate environment and so on.
I don't feel like 'random' is an accurate word here. 'Stochastic' might be better. Environmental factors like interior design and chemical influences like blood sugar have major effects, but these effects are enumerable and vary little across cultures, ages, etc.
Given how stochastic your emotional responses are, it's best not to rely on the intense emotions for any sort of judgment. If you can't tell whether you're raging because someone said something intolerable or because your blood sugar is low so your parasympathetic nervous activation is low so you couldn't process the nuance of their statements, better not act on that rage until you've had something to eat. If you can't tell whether you're fine with what someone said because they probably didn't mean it as badly as it sounds or because you're tired so your sympathetic nervous activation is low, better not commit to that condonement until you've had a nap.
As far as I can tell, the AI has no specialized architecture for deciding about its future strategies or giving semantic meaning to its words. It outputting the string "I will keep Gal a DMZ" does not have the semantic meaning of it committing to keep troops out of Gal. It's just the phrase players that are most likely to win use in that boardstate with its internal strategy.
Like chess grandmasters being outperformed by a simple search tree when it was supposed to be the peak of human intelligence, I think this will have the same effect of disenchanting the game of diplomacy. Humans are not decision theoretical geniuses; just saying whatever people want you to hear while playing optimally for yourself is sufficient to win. There may be a level of play where decision theory and commitments are relevant, but humans just aren't that good.
That said, I think this is actually a good reason to update towards freaking out. It's happened quite a few times now that 'naive' big milestones have been hit unexpectedly soon "without any major innovations or new techniques" - chess, go, starcraft, dota, gpt-3, dall-e, and now diplomacy. It's starting to look like humans are less complicated than we thought - more like a bunch of current-level AI architectures squished together in the same brain (with some capacity to train new ones in deployment) than like a powerful generally applicable intelligence. Or a room full of toddlers with superpowers, to use the CFAR phrase. While this doesn't increase our estimates of the rate of AI development, it does suggest that the goalpost for superhuman intellectual performance in all areas is closer than we might have thought otherwise.
Dear M.Y. Zuo,
I hope you are well.
It is my experience that the conventions of e-mail are significantly more formal and precise in expectation when it comes to phrasing. Discord and Slack, on the other hand, have an air of informal chatting, which makes it feel more acceptable to use shortcuts and to phrase things less carefully. While feelings may differ between people and conventions between groups, I am quite confident that these conventions are common due to both media's origins, as a replacement for letters and memos and as a replacement for in-person communication respectively.
Don't hesitate to ask if you have any further questions.
Best regards,
Daphne Will
I don't think that's really true. People are a lot more informal on Discord than e-mail because of where they're both derived from.
That's a bit of a straw man, though to be fair it appears my question didn't fit into your world model as it does in mine.
For me, the insurrection was in the top 5 most informative/surprising US political events in 2017-2021. On account of its failure it didn't have as major consequences as others, but it caused me to update my world model more. For me, it was a sudden confrontation with the size and influence of anti-democratic movements within the Republican party, which I consider Trump to be sufficiently associated with to cringe from the notion of voting for him.
The core of my question is whether your world model has updated from
Given our invincible military, the only danger to us is a nuclear war (meaning Russia).
For me, the January insurrection was a big update away from that statement, so I was curious how it fit in your world model, but I suppose the insurrection is not necessarily the key. Did your probability of (a subset of) Republicans ending American democracy increase over the Trump presidency?
Noting that a Republican terrorist might still have attempted to commit acts of terror with Clinton in office does not mitigate the threat posed by (a subset of) Republicans. Between self-identified Democrats pissing off a nuclear power enough to start a world war and self-identified Republicans causing the US to no longer have functional elections, my money is on the latter.
If I had to use a counterfactual, I would propose imagining a world where the political opinions of all US citizens as projected on a left-right axis were 0.2 standard deviations further to the Left (or Right).
With Trump/Republicans I meant the full range of questions from from just Trump, through participants in the storming of congress, to all Republican voters.
It seems quite easy for a large fraction of a population to be a threat to the population's interests if they share a particular dangerous behavior. I'm confused why you would think that would be difficult. Threat isn't complete or total. If you don't get a vaccine or wear a mask, you're a threat to immune-compromissd people but you can still do good work professionally. If you vote for someone attempting to overthrow democracy, you're a danger to the nation while in the voting booth but you can still do good work volunteering. As for how the nation can survive such a large fraction working against its interests - it wouldn't, in equilibrium, but there's a lot of inertia.
It seems weird that people storming the halls of Congress, building gallows for a person certifying the transition of power, and killing and getting killed attempting to reach that person, would lead to no update at all on who is a threat to America. I suppose you could have factored this sort of thing in from the start, but in that case I'm curious how you would have updated on potential threats to America if the insurrection didn't take place.
Ultimately the definition of 'threat' feels like a red herring compared to the updates in the world model. So perhaps more concretely: what's the minimum level of violence at the insurrection that would make you have preferred Hillary over Trump? How many Democratic congresspeople would have to die? How many Republican congresspeople? How many members of the presidential chain of command (old or new)?
Hey, I stumbled on this comment and I'm wondering if you've updated on whether you consider Trump/Republicans a threat to America's interests in light of the January 6th insurrection.
People currently give MIRI money in the hopes they will use it for alignment. Those people can't explain concretely what MIRI will do to help alignment. By your standard, should anyone give MIRI money?
When you're part of a cooperative effort, you're going to be handing off tools to people (either now or in the future) which they'll use in ways you don't understand and can't express. Making people feel foolish for being a long inferential distance away from the solution discourages them from laying groundwork that may well be necessary for progress, or even from exploring.
As a concrete example of rational one-hosing, here in the Netherlands it rarely gets hot enough that ACs are necessary, but when it does a bunch of elderly people die of heat stroke. Thus, ACs are expected to run only several days per year (so efficiency concerns are negligible), but having one can save your life.
I checked the biggest Dutch-only consumer-facing online retailer for various goods (bol.com). Unfortunately I looked before making a prediction for how many one-hose vs two-hose models they sell, but even conditional on me choosing to make a point of this, it still seems like it could be useful for readers to make a prediction at this point. Out of 694 models of air conditioner labeled as either one-hose or two-hose,
3
are two-hose.
This seems like strong evidence that the market successfully adapts to actual consumer needs where air conditioner hose count is concerned.
It feels more to me like we're the quiet weird kid in high school that doesn't speak up or show emotion because we're afraid of getting judged or bullied. Which, fair enough, the school is sort of like - just look at poor cryonics, or even nuclear power - but the road to popularity (let along getting help with what's bugging us) isn't to try to minimize our expressions to 'proper' behavior while letting us be characterized by embarrassing past incidents (e.g. Roko's Basilisk) if we're noticed at all.
It isn't easy to build social status, but right now we're trying next to nothing and we've seen it doesn't seem to do enough.
Agree that it's too shallow to take seriously, but
If it answered "you would say during text input batch 10-203 in January 2022, but subjectively it was about three million human years ago" that would be something else.
only seems to capture AI that managed to gradient hack the training mechanism to pass along its training metadata and subjective experience/continuity. If a language model were sentient in each separate forward pass, I would imagine it would vaguely remember/recognize things from its training dataset without necessarily being able to place them, like a human when asked when they learned how to write the letter 'g'.
Interventions on the order of burning all GPUs in clusters larger than 4 and preventing any new clusters from being made, including the reaction of existing political entities to that event and the many interest groups who would try to shut you down and build new GPU factories or clusters hidden from the means you'd used to burn them, would in fact really actually save the world for an extended period of time and imply a drastically different gameboard offering new hopes and options.
I suppose 'on the order of' is the operative phrase here, but that specific scenario seems like it would be extremely difficult to specify an AGI for without disastrous side-effects and like it still wouldn't be enough. Other, less efficient or less well developed forms of compute exist, and preventing humans from organizing to find a way around the GPU-burner's blacklist for unaligned AGI research while differentially allowing them to find a way to build friendly AGI seems like it would require a lot of psychological/political finesse on the GPU-burner's part. It's on the level of Ozymandias from Watchmen, but it's cartoonish supervillainy nontheless.
I guess my main issue is a matter of trust. You can say the right words, as all the best supervillains do, promising that the appropriate cautions are taken above our clearance level. You've pointed out plenty of mistakes you could be making, and the ease with which one can make mistakes in situations such as yours, but acknowledging potential errors doesn't prevent you from making them. I don't expect you to have many people you would trust with AGI, and I expect that circle would shrink further if those people said they would use the AGI to do awful things iff it would actually save the world [in their best judgment]. I currently have no-one in the second circle.
If you've got a better procedure for people to learn to trust you, go ahead, but is there something like an audit you've participated in/would be willing to participate in? Any references regarding your upstanding moral reasoning in high-stakes situations that have been resolved? Checks and balances in case of your hardware being corrupted?
You may be the audience member rolling their eyes at the cartoon supervillain, but I want to be the audience member rolling their eyes at HJPEV when he has a conversation with Quirrel where he doesn't realise that Quirrel is evil.
AI can run on CPUs (with a certain inefficiency factor), so only burning all GPUs doesn't seem like it would be sufficient. As for disruptive acts that are less deadly, it would be nice to have some examples but Eliezer says they're too far out of the Overton Window to mention.
If what you're saying about Eliezer's claim is accurate, it does seem disingenuous to frame "The only worlds where humanity survives are ones where people like me do something extreme and unethical" as "I won't do anything extreme and unethical [because humanity is doomed anyway]". It makes Eliezer dangerous to be around if he's mistaken, and if you're significantly less pessimistic than he is (if you assign >10^-6 probability to humanity surviving), he's mistaken in most of the worlds where humanity survives. Which are the worlds that matter the most.
And yeah, it's nice that Eliezer claims that Eliezer can violate ethical injunctions because he's smart enough, after repeatedly stating that people who violate ethical injunctions because they think they're smart enough are almost always wrong. I don't doubt he'll pick the option that looks actually better to him. It's just that he's only human - he's running on corrupted hardware like the rest of us.
I'm confused about A6, from which I get "Yudkowsky is aiming for a pivotal act to prevent the formation of unaligned AGI that's outside the Overton Window and on the order of burning all GPUs". This seems counter to the notion in Q4 of Death with Dignity where Yudkowsky says
It's relatively safe to be around an Eliezer Yudkowsky while the world is ending, because he's not going to do anything extreme and unethical unless it would really actually save the world in real life, and there are no extreme unethical actions that would really actually save the world the way these things play out in real life, and he knows that. He knows that the next stupid sacrifice-of-ethics proposed won't work to save the world either, actually in real life.
I would estimate that burning all AGI-capable compute would disrupt every factor of the global economy for years and cause tens of millions of deaths[1], and that's what Yudkowsky considers the more mentionable example. Do the other options outside the Overton Window somehow not qualify as unsafe/extreme unethical actions (by the standards of the audience of Death with Dignity)? Has Yudkowsky changed his mind on what options would actually save the world? Does Yudkowsky think that the chances of finding a pivotal act that would significantly delay unsafe AGI are so slim that he's safe to be around despite him being unsafe in the hypothetical that such a pivotal act is achievable? I'm confused.
Also, I'm not sure how much overlap there is between people who do Bayesian updates and people for who whatever Yudkowsky is thinking of is outside the Overton Window, but in general, if someone says that what they actually want is outside your Overton Window, I see only two directions to update in: either shift your Overton Window to include their intent, or shift your opinion of them to outside your Overton Window. If the first option isn't going to happen, as Yudkowsky says (for public discussion on lesswrong at least), that leaves the second.
- ^
Compare modern estimates of the damage that would be caused by a solar flare equivalent to the Carrington Event. Factories, food supply, long-distance communication, digital currency - many critical services nowadays are dependent on compute, and that portion will only increase by the time you would actually pull the trigger.
Your method of trying to determine whether something is true or not relies overly much on feedback from strangers. Your comment demands large amounts of intellectual labor from others ('disprove why all easier modes are incorrect'), despite the preamble of the post, while seeming unwilling to put much work in yourself.
I think Yudkowsky would argue that on a scale from never learning anything to eliminating half your hypotheses per bit of novel sensory information, humans are pretty much at the bottom of the barrel.
When the AI needs to observe nature, it can rely on petabytes of publicly available datasets from particle physics to biochemistry to galactic surveys. It doesn't need any more experimental evidence to solve human physiology or build biological nanobots: we've already got quantum mechanics and human DNA sequences. The rest is just derivation of the consequences.
Sure, there are specific physical hypotheses that the AGI can't rule out because humanity hasn't gathered the evidence for them. But that, by definition, excludes anything that has ever observably affected humans. So yes, for anything that has existed since the inflationary period, the AGI will not be bottlenecked on physically gathering evidence.
I don't really get what you're pointing at with "how much AGI will be smarter than humans", so I can't really answer your last question. How much smarter than yourself would you say someone like Euler is than yourself? Is his ability to do scientific/mathematical breakthroughs proportional to your difference in smarts?
- Solve protein folding problem
- Acquire human DNA sample
- Use superintelligence to construct a functional model of human biochemistry
- Design a virus that exploits human biochemstry
- Use one of the currently available biochemistry-as-a-service providers to produce a sample that incubates the virus and then escapes their safety procedures (e.g. pay someone to mix two vials sent to them in the mail. The aerosols from the mixing infect them)
Hey, it's now officially no longer May 27th anywhere, and I can't find any announcements yet. How's it going?
Edit: Just got my acceptance letter! See you all this summer!
Sorry that automation is taking your craft. You're neither the first nor the last this will happen to. Orators, book illuminators, weavers, portrait artists, puppeteers, cartoon animators, etc. Even just in the artistic world, you're in fine company. Generally speaking, it's been good for society to free up labor for different pursuits while preserving production. The art can even be elevated as people incorporate the automata into their craft. It's a shame the original skill is lost, but if that kept us from innovating, there would be no way to get common people multiple books or multiple pictures of themselves or CGI movies. It seems fair to demand society have a way to support people whose jobs have been automated, at least until they can find something new to do. But don't get mad at the engine of progress and try to stop it - people will just cheer as it runs you over.
Before learning about reversible computation only requiring work when bits are deleted I would have treated each of my points as roughly independent with about 10^1.5 , 10^4 , 10^4 , 10^2.5 odds against respectively. The last point is now down to 10^1.5 .
Dumping waste information in the baryonic world would be visible.
#1 - Caution doesn't solve problems, it finds solutions if they exist. You can't use caution to ignore air resistance when building a rocket. (Though collapse is not necessarily expected - there's plenty of interstellar dust).
#4 - I didn't know about Landauer's principle, though going by what I'm reading, you're mistaken on its interpretation - it takes 'next to nothing' times the part of the computation you throw out, not the part you read out, where the part you throw out increases proportional to the negentropy you're getting. No free lunch, still, but one whose price is deferable to the moment you run out of storage space.
That would make it possible for dark matter to be part of a computation that hasn't been read out yet, though not necessarily a major part: I'm not sure the below reasoning is correct, but The Landauer limit with the current 2.7K universe as heat bath is 0.16 meV per bit. This means that the 'free' computational cycle you get from the fact that you only need to pay at the end would, to a maximally efficient builder, reward them with 0.16 meV extra for every piece of matter that can hold one bit. We don't yet have a lower bound for the neutrino mass, but the upper bound is 120 meV. If the upper bound is true, that would mean you would have to cram 10^3 bits in a neutrino before using it as storage nets you more than burning it for energy (by chucking it into an evaporating black hole).
I don't have data for #2 and #3 at hand. It's the scientific consensus, for what that's worth.
1:10^12 odds against the notion, easily. About as likely as the earth being flat.
- Dark matter does not interact locally with itself or visible matter. If it did, it would experience friction (like interstellar gas, dust and stars) and form into disk shapes when spiral galaxies form into disk shapes. A key observation of dark matter is that spiral galaxies' rotational velocity behaves as one would expect from an ellipsoid.
- The fraction of matter that is dark does not change over time, nor does the total mass of objects in the universe. Sky surveys do not find more visible matter further back in time.
- The fraction of matter that is dark does not change across space, even across distances that have not been bridgable since the inflation period of the big bang. All surveys show spherical symmetry.
- By the laws of thermodynamics, computation requires work. High-entropy energy needs to be converted into low-entropy energy, such as heat. We do not see dark matter absorb or emit energy.
I can imagine no situation where something that is a required part of computational processes could ever present itself to us as dark matter, and no mistake in physics thorough enough to allow it.
Unlike what you would expect with black holes, we can see that the Boötes void contains very little mass by looking for gravitational lensing and the movement of surrounding galaxies.
On the SLOAN webpage, there's a list of ongoing and completed surveys, some of which went out to z=3 (10 billion years ago/away), though the more distant ones didn't use stellar emissions as output. Here is a youtube video visualizing the data that eBOSS (a quasar study) added in 2020, but it shows it alongside visible/near-infrared galaxy data (blue to green datasets), which go up to about 6 billion years. Radial variations in density in the observed data can be explained by local obstructions (the galactic plane, gas clouds, nearby galaxies), while radially symmetric variations can be explained by different instruments' suitability to different timescales.
Just eyeballing it, it doesn't look like there are any spherical irregularities more than 0.5 billion light years across.
If you want to look more carefully, here are instructions for downloading the dataset or specific parts of it.
You should also note that Dyson spheres aren't just stars becoming invisible. Energy is conserved, so every star with a Dyson sphere around it emits the same amount of radiation as before, it's just shifted to a lower part of the spectrum. For example, a Dyson sphere located at 1 AU from the Sun would emit black body radiation at about 280 K. A Dyson sphere at 5 AU would be able to extract more negentropy at the cost of more material, and have a temperature of 12 K - low enough to show up on WMAP (especially once redshifted by distance). I actually did my Bachelor thesis reworking some of the math on a paper that looked for circular irreglarities in the WMAP data and found none.
There definitely seem to be (relative) grunt work positions in AI safety, like this, this or this. Unless you think these are harmful, it seems like it would be better to direct the Alec-est Alecs of the world that way instead of risking them never contributing.
I understand not wanting to shoulder responsibility for their career personally, and I understand wanting an unbounded culture for those who thrive under those conditions, but I don't see the harm in having a parallel structure for those who do want/need guidance.
Well, it's better, but in I think you're still playing into [Alec taking things you say as orders], which I claim is a thing, so that in practice Alec will predictably systematically be less helpful and more harmful than if he weren't [taking things you say as orders].
There seems to be an assumption here that Alec would do something relatively helpful instead if he weren't taking the things you say as orders. I don't think this is always the case: for people who aren't used to thinking for themselves, the problem of directing your career to reduce AI risk is not a great testbed (high stakes, slow feedback), and without guidance they can just bounce off, get stuck with decision paralysis, or listen to people who don't have qualms about giving advice.
Like, imagine Alec gives you API access to his brain, with a slider that controls how much of his daily effort he spends not following orders/doing what he thinks is best . You may observe that his slider is set lower than most productive people in AI safety, but (1) it might not help him or others to crank it up and (2) if it is helpful to crank it up, that seems like a useful order to give.
Anna's Scenario 3 seems like a good way to self-consistently nudge the slider upwards over a longer period of time, as do most of your suggestions.
This is where things go wrong. The actual credence of seeing a hypercomputer is zero, because a computationally bounded observer can never observe such an object in such a way that differentiates it from a finite approximation. As such, you should indeed have a zero percent probability of ever moving into a state in which you have performed such a verification, it is a logical impossibility. Think about what it would mean for you, a computationally bounded approximate bayesian, to come into a state of belief that you are in possession of a hypercomputer (and not a finite approximation of a hypercomputer, which is just a normal computer. Remember arbitrarily large numbers are still infinitely far away from infinity!). What evidence would you have to observe for this belief? You would need to observe literally infinite bits, and your credence to observing infinite bits should be zero, because you are computationally bounded! If you yourself are not a hypercomputer, you can never move into the state of believing a hypercomputer exists.
Sorry, I previously assigned hypercomputers a non-zero credence, and you're asking me to assign it zero credence. This requires an infinite amount of bits to update, which is impossible to collect in my computationally bounded state. Your case sounds sensible, but I literally can't receive enough evidence over the course of a lifetime to be convinced by it.
Like, intuitively, it doesn't feel literally impossible that humanity discovers a computationally unbounded process in our universe. If a convincing story is fed into my brain, with scientific consensus, personally verifying the math proof, concrete experiments indicating positive results, etc., I expect I would believe it. In my state of ignorance, I would not be surprised to find out there's a calculation which requires a computationally unbounded process to calculate but a bounded process to verify.
To actually intuitively give something 0 (or 1) credence, though, to be so confident in a thesis that you literally can't change your mind, that at the very least seems very weird. Self-referentially, I won't actually assign that situation 0 credence, but even if I'm very confident that 0 credence is correct, my actual credence will be bounded by my uncertainty in my method of calculating credence.
That's not a middle ground between a good world and a neutral world, though, that's just another way to get a good world. If we assume a good world is exponentially unlikely, a 10 year delay might mean the odds of a good world rise from 10^-10 to 10^-8 (as opposed to pursuing Clippy bringing the odds of a bad world down from 10^-4 to 10^-6 ).
If you disagree with Yudkowsky about his pessimism about the probability of good worlds, then my post doesn't really apply. My post is about how to handle him being correct about the odds.
That's a fair point - my model does assume AGI will come into existence in non-negative worlds. Though I struggle to actually imagine a non-negative world where humanity is alive a thousand years from now and AGI hasn't been developed. Even if all alignment researchers believed it was the right thing to pursue, which doesn't seem likely.
Both that and Q5 seem important to me.
Q5 is an exploration of my uncertainty in spite of me not being able to find faults with Clippy's argument, as well as what I expect others' hesitance might be. If Clippy's argument is correct, then the section you highlight seems like the logical conclusion.
That's the gist of it.