Posts

A distillation of Evan Hubinger's training stories (for SERI MATS) 2022-07-18T03:38:38.994Z

Comments

Comment by Daphne_W on On green · 2024-04-15T23:41:16.424Z · LW · GW

I'd say "fuck all the people who are harming nature" is black-red/rakdos's view of white-green/selesnya. The "fuck X" attitude implies a certain passion that pure black would call wasted motion. Black is about power. It's not adversarial per se, just mercenary/agentic. Meanwhile the judginess towards others is an admixture of white. Green is about appreciating what is, not endorsing or holding on to it.

Black's view of green is "careless idiots, easy to take advantage of if you catch them by surprise". When black meets green, black notices how the commune's rules would allow someone to scam them of all their cash and how the charms they're wearing cost 10 times less to produce than what they paid for it.

Black-red/Rakdos' view of green is "tree huggers, weirdly in love with nature rather than everything else you can care about". When rakdos meets green they're inspired to throw a party in the woods, concluding it's kinda lame without a lightshow or a proper toilet, and leaving tons of garbage when they return home.

Black's view of white-green/selesnya is "people who don't seem to grasp the tragedy of the commons who can get obnoxiously intrusive about it. Sure nature can be nice but it's not worth wasting that much political capital on." When black meets selesnya, it tries to find an angle by which selesnya can give them more power. Maybe a ecological development grant that has shoddy selection criteria or a lopsided political deal.

Meanwhile black-green/golgari is "It is natural for people to be selfish. Everyone chooses themselves eventually, that's an evolutionary given. I will make selfish choices and appreciate the world, as any sane person would". It views selesnya as a grift, green as passive, black as self-absorbed, and rakdos as irrational.

I would say ecofascism is white-green-black/Abzan. The hard agency of black, the sense of communal approach of white, and the appreciation of nature of green, but lacking the academic rigor of blue or the wild passion of red.

Comment by Daphne_W on On green · 2024-04-15T22:47:03.823Z · LW · GW

Fighting with tigers is red-green, or Gruul by MTG terminology. The passionate, anarchic struggle of nature red in tooth and claw. Using natural systems to stay alive even as it destroys is black-green, or Golgari. Rot, swarms, reckless consumption that overwhelms.

Pure green is a group of prehistoric humans sitting around a campfire sharing ghost stories and gazing at the stars. It's a cave filled with handprints of hundreds of generations that came before. It's cats louging in a sunbeam or birds preening their feathers. It's rabbits huddling up in their dens until the weather is better, it's capybaras and monkeys in hot springs, and bears lazily going to hibernate. These have intelligible justifications, sure, but what do these animals experience while engaging in these activities?

Most vertebrates seem to have a sense of green, of relaxation and watching the world flow by. Physiologically, when humans and other animals relax, the sympathetic nervious system is suppressed and the parasympathetic system stays/becomes active. This causes the muscles to relax and causes the blood stream to prioritize digestion. For humans at least, stress and the pressure to find solutions right now decrease and the mind wanders. Attention loses its focus but remains high-bandwidth. This green state is where people most often come up with 'creative' solutions that draw on a holistic understanding of the situation.

Green is the notion that you don't have to strive towards anything, and the moment an animal does need to strive for something they mix in red, blue, black, or white, depending on what the situation calls for and the animal's color evolved toolset.

The colors exist because no color on its own is viable. Green can't keep you alive, and that's okay, it isn't meant to.

Comment by Daphne_W on If you weren't such an idiot... · 2024-03-04T12:47:46.715Z · LW · GW

That doesn't seem like a good idea. You're ignoring long-term harms and benefits of the activity - otherwise cycling would be net positive - and you're ignoring activity duration. People don't commute to work by climbing Mount Everest or going skydiving.

Comment by Daphne_W on Social Dark Matter · 2023-11-22T08:29:12.037Z · LW · GW

I don't think it's precisely true. The serene antagonism that comes from having examined something and recognizing that it is worth taking your effort to destroy is different from the hot rage of offense. But of the two, I expect antagonism to be more effective in the long term.

  • Rage is accompanied with a surge of adrenalin, sympathetic nervous activation, and usually parasympathetic nervous suppression, that is not sustainable in the long term. Antagonism is compatible with physiological rest and changes in the environment.
  • Consequently, antagonism has access to system 2 and long term planning, while rage tends to have a short term view with limited information processing capabilities.
  • Even when your antagonism calls for rapid physical action and rage, having a better understanding of the situation prevents you from being held back by doubt when you encounter (emotional) evidence that doesn't fit your current tack. The release of adrenalin and start of rage can then reliably be triggered by the feeling that you have unhindered access to the object of hatred.
  • It's also possible when coming from calm antagonism to choose between rage and the state of both high parasympathetic and high sympathetic activation, where you're active but still have high sensory processing bandwidth (see also runner's high, sexual activity, or being 'in the zone' with sports or high-apm games), which for anger might be called pugnacity or bloodlust or simply an eagerness to fight.

Rage is good for punching the baddies in front of you in the face if you can take them in a straight fight. Pugnacity is good for systematically outmaneuvering their defenses and finding the path to victory in combat. Antagonism is good for making their death a week from now look like an accident, or to arrange a situation where rage and pugnacity can do their jobs unhindered.

but people recently have been arguing to me that the coming and going of emotions is a much more random process influenced by chemicals and immediate environment and so on.

I don't feel like 'random' is an accurate word here. 'Stochastic' might be better. Environmental factors like interior design and chemical influences like blood sugar have major effects, but these effects are enumerable and vary little across cultures, ages, etc.

Given how stochastic your emotional responses are, it's best not to rely on the intense emotions for any sort of judgment. If you can't tell whether you're raging because someone said something intolerable or because your blood sugar is low so your parasympathetic nervous activation is low so you couldn't process the nuance of their statements, better not act on that rage until you've had something to eat. If you can't tell whether you're fine with what someone said because they probably didn't mean it as badly as it sounds or because you're tired so your sympathetic nervous activation is low, better not commit to that condonement until you've had a nap.

Comment by Daphne_W on Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue) · 2022-11-23T05:55:27.842Z · LW · GW

As far as I can tell, the AI has no specialized architecture for deciding about its future strategies or giving semantic meaning to its words. It outputting the string "I will keep Gal a DMZ" does not have the semantic meaning of it committing to keep troops out of Gal. It's just the phrase players that are most likely to win use in that boardstate with its internal strategy.

Like chess grandmasters being outperformed by a simple search tree when it was supposed to be the peak of human intelligence, I think this will have the same effect of disenchanting the game of diplomacy. Humans are not decision theoretical geniuses; just saying whatever people want you to hear while playing optimally for yourself is sufficient to win. There may be a level of play where decision theory and commitments are relevant, but humans just aren't that good.

That said, I think this is actually a good reason to update towards freaking out. It's happened quite a few times now that 'naive' big milestones have been hit unexpectedly soon "without any major innovations or new techniques" - chess, go, starcraft, dota, gpt-3, dall-e, and now diplomacy. It's starting to look like humans are less complicated than we thought - more like a bunch of current-level AI architectures squished together in the same brain (with some capacity to train new ones in deployment) than like a powerful generally applicable intelligence. Or a room full of toddlers with superpowers, to use the CFAR phrase. While this doesn't increase our estimates of the rate of AI development, it does suggest that the goalpost for superhuman intellectual performance in all areas is closer than we might have thought otherwise.

Comment by Daphne_W on SERI MATS Program - Winter 2022 Cohort · 2022-10-12T22:09:00.508Z · LW · GW

Dear M.Y. Zuo,

 

I hope you are well.

It is my experience that the conventions of e-mail are significantly more formal and precise in expectation when it comes to phrasing. Discord and Slack, on the other hand, have an air of informal chatting, which makes it feel more acceptable to use shortcuts and to phrase things less carefully. While feelings may differ between people and conventions between groups, I am quite confident that these conventions are common due to both media's origins, as a replacement for letters and memos and as a replacement for in-person communication respectively.

Don't hesitate to ask if you have any further questions.

Best regards,

Daphne Will


I don't think that's really true. People are a lot more informal on Discord than e-mail because of where they're both derived from.

Comment by Daphne_W on Yudkowsky vs Trump: the nuclear showdown. · 2022-07-11T19:12:59.456Z · LW · GW

That's a bit of a straw man, though to be fair it appears my question didn't fit into your world model as it does in mine.

For me, the insurrection was in the top 5 most informative/surprising US political events in 2017-2021. On account of its failure it didn't have as major consequences as others, but it caused me to update my world model more. For me, it was a sudden confrontation with the size and influence of anti-democratic movements within the Republican party, which I consider Trump to be sufficiently associated with to cringe from the notion of voting for him.

The core of my question is whether your world model has updated from

Given our invincible military, the only danger to us is a nuclear war (meaning Russia).

For me, the January insurrection was a big update away from that statement, so I was curious how it fit in your world model, but I suppose the insurrection is not necessarily the key. Did your probability of (a subset of) Republicans ending American democracy increase over the Trump presidency?

Noting that a Republican terrorist might still have attempted to commit acts of terror with Clinton in office does not mitigate the threat posed by (a subset of) Republicans. Between self-identified Democrats pissing off a nuclear power enough to start a world war and self-identified Republicans causing the US to no longer have functional elections, my money is on the latter.

If I had to use a counterfactual, I would propose imagining a world where the political opinions of all US citizens as projected on a left-right axis were 0.2 standard deviations further to the Left (or Right).

Comment by Daphne_W on Yudkowsky vs Trump: the nuclear showdown. · 2022-07-11T07:54:17.501Z · LW · GW

With Trump/Republicans I meant the full range of questions from from just Trump, through participants in the storming of congress, to all Republican voters.

It seems quite easy for a large fraction of a population to be a threat to the population's interests if they share a particular dangerous behavior. I'm confused why you would think that would be difficult. Threat isn't complete or total. If you don't get a vaccine or wear a mask, you're a threat to immune-compromissd people but you can still do good work professionally. If you vote for someone attempting to overthrow democracy, you're a danger to the nation while in the voting booth but you can still do good work volunteering. As for how the nation can survive such a large fraction working against its interests - it wouldn't, in equilibrium, but there's a lot of inertia.

It seems weird that people storming the halls of Congress, building gallows for a person certifying the transition of power, and killing and getting killed attempting to reach that person, would lead to no update at all on who is a threat to America. I suppose you could have factored this sort of thing in from the start, but in that case I'm curious how you would have updated on potential threats to America if the insurrection didn't take place.

Ultimately the definition of 'threat' feels like a red herring compared to the updates in the world model. So perhaps more concretely: what's the minimum level of violence at the insurrection that would make you have preferred Hillary over Trump? How many Democratic congresspeople would have to die? How many Republican congresspeople? How many members of the presidential chain of command (old or new)?

Comment by Daphne_W on Yudkowsky vs Trump: the nuclear showdown. · 2022-06-29T12:31:03.819Z · LW · GW

Hey, I stumbled on this comment and I'm wondering if you've updated on whether you consider Trump/Republicans a threat to America's interests in light of the January 6th insurrection.

Comment by Daphne_W on Conversation with Eliezer: What do you want the system to do? · 2022-06-28T02:32:41.546Z · LW · GW

People currently give MIRI money in the hopes they will use it for alignment. Those people can't explain concretely what MIRI will do to help alignment. By your standard, should anyone give MIRI money?

When you're part of a cooperative effort, you're going to be handing off tools to people (either now or in the future) which they'll use in ways you don't understand and can't express. Making people feel foolish for being a long inferential distance away from the solution discourages them from laying groundwork that may well be necessary for progress, or even from exploring.

Comment by Daphne_W on Air Conditioner Test Results & Discussion · 2022-06-24T05:54:45.581Z · LW · GW

As a concrete example of rational one-hosing, here in the Netherlands it rarely gets hot enough that ACs are necessary, but when it does a bunch of elderly people die of heat stroke. Thus, ACs are expected to run only several days per year (so efficiency concerns are negligible), but having one can save your life.

I checked the biggest Dutch-only consumer-facing online retailer for various goods (bol.com). Unfortunately I looked before making a prediction for how many one-hose vs two-hose models they sell, but even conditional on me choosing to make a point of this, it still seems like it could be useful for readers to make a prediction at this point. Out of 694 models of air conditioner labeled as either one-hose or two-hose,

3

are two-hose.

This seems like strong evidence that the market successfully adapts to actual consumer needs where air conditioner hose count is concerned.

Comment by Daphne_W on [deleted post] 2022-06-18T07:51:38.518Z

It feels more to me like we're the quiet weird kid in high school that doesn't speak up or show emotion because we're afraid of getting judged or bullied. Which, fair enough, the school is sort of like - just look at poor cryonics, or even nuclear power - but the road to popularity (let along getting help with what's bugging us) isn't to try to minimize our expressions to 'proper' behavior while letting us be characterized by embarrassing past incidents (e.g. Roko's Basilisk) if we're noticed at all.

It isn't easy to build social status, but right now we're trying next to nothing and we've seen it doesn't seem to do enough.

Comment by Daphne_W on A claim that Google's LaMDA is sentient · 2022-06-12T08:40:13.122Z · LW · GW

Agree that it's too shallow to take seriously, but

If it answered "you would say during text input batch 10-203 in January 2022, but subjectively it was about three million human years ago" that would be something else.

only seems to capture AI that managed to gradient hack the training mechanism to pass along its training metadata and subjective experience/continuity. If a language model were sentient in each separate forward pass, I would imagine it would vaguely remember/recognize things from its training dataset without necessarily being able to place them, like a human when asked when they learned how to write the letter 'g'.

Comment by Daphne_W on AGI Ruin: A List of Lethalities · 2022-06-08T10:45:41.910Z · LW · GW

Interventions on the order of burning all GPUs in clusters larger than 4 and preventing any new clusters from being made, including the reaction of existing political entities to that event and the many interest groups who would try to shut you down and build new GPU factories or clusters hidden from the means you'd used to burn them, would in fact really actually save the world for an extended period of time and imply a drastically different gameboard offering new hopes and options.

I suppose 'on the order of' is the operative phrase here, but that specific scenario seems like it would be extremely difficult to specify an AGI for without disastrous side-effects and like it still wouldn't be enough. Other, less efficient or less well developed forms of compute exist, and preventing humans from organizing to find a way around the GPU-burner's blacklist for unaligned AGI research while differentially allowing them to find a way to build friendly AGI seems like it would require a lot of psychological/political finesse on the GPU-burner's part. It's on the level of Ozymandias from Watchmen, but it's cartoonish supervillainy nontheless.

I guess my main issue is a matter of trust. You can say the right words, as all the best supervillains do, promising that the appropriate cautions are taken above our clearance level. You've pointed out plenty of mistakes you could be making, and the ease with which one can make mistakes in situations such as yours, but acknowledging potential errors doesn't prevent you from making them. I don't expect you to have many people you would trust with AGI, and I expect that circle would shrink further if those people said they would use the AGI to do awful things iff it would actually save the world [in their best judgment]. I currently have no-one in the second circle.

If you've got a better procedure for people to learn to trust you, go ahead, but is there something like an audit you've participated in/would be willing to participate in? Any references regarding your upstanding moral reasoning in high-stakes situations that have been resolved? Checks and balances in case of your hardware being corrupted?

You may be the audience member rolling their eyes at the cartoon supervillain, but I want to be the audience member rolling their eyes at HJPEV when he has a conversation with Quirrel where he doesn't realise that Quirrel is evil.

Comment by Daphne_W on AGI Ruin: A List of Lethalities · 2022-06-07T21:14:55.996Z · LW · GW

AI can run on CPUs (with a certain inefficiency factor), so only burning all GPUs doesn't seem like it would be sufficient. As for disruptive acts that are less deadly, it would be nice to have some examples but Eliezer says they're too far out of the Overton Window to mention.

If what you're saying about Eliezer's claim is accurate, it does seem disingenuous to frame "The only worlds where humanity survives are ones where people like me do something extreme and unethical" as "I won't do anything extreme and unethical [because humanity is doomed anyway]". It makes Eliezer dangerous to be around if he's mistaken, and if you're significantly less pessimistic than he is (if you assign >10^-6 probability to humanity surviving), he's mistaken in most of the worlds where humanity survives. Which are the worlds that matter the most.

And yeah, it's nice that Eliezer claims that Eliezer can violate ethical injunctions because he's smart enough, after repeatedly stating that people who violate ethical injunctions because they think they're smart enough are almost always wrong. I don't doubt he'll pick the option that looks actually better to him. It's just that he's only human - he's running on corrupted hardware like the rest of us.

Comment by Daphne_W on AGI Ruin: A List of Lethalities · 2022-06-07T12:35:57.666Z · LW · GW

I'm confused about A6, from which I get "Yudkowsky is aiming for a pivotal act to prevent the formation of unaligned AGI that's outside the Overton Window and on the order of burning all GPUs". This seems counter to the notion in Q4 of Death with Dignity where Yudkowsky says

It's relatively safe to be around an Eliezer Yudkowsky while the world is ending, because he's not going to do anything extreme and unethical unless it would really actually save the world in real life, and there are no extreme unethical actions that would really actually save the world the way these things play out in real life, and he knows that.  He knows that the next stupid sacrifice-of-ethics proposed won't work to save the world either, actually in real life. 

I would estimate that burning all AGI-capable compute would disrupt every factor of the global economy for years and cause tens of millions of deaths[1], and that's what Yudkowsky considers the more mentionable example. Do the other options outside the Overton Window somehow not qualify as unsafe/extreme unethical actions (by the standards of the audience of Death with Dignity)? Has Yudkowsky changed his mind on what options would actually save the world? Does Yudkowsky think that the chances of finding a pivotal act that would significantly delay unsafe AGI are so slim that he's safe to be around despite him being unsafe in the hypothetical that such a pivotal act is achievable? I'm confused.

Also, I'm not sure how much overlap there is between people who do Bayesian updates and people for who whatever Yudkowsky is thinking of is outside the Overton Window, but in general, if someone says that what they actually want is outside your Overton Window, I see only two directions to update in: either shift your Overton Window to include their intent, or shift your opinion of them to outside your Overton Window. If the first option isn't going to happen, as Yudkowsky says (for public discussion on lesswrong at least), that leaves the second.

  1. ^

    Compare modern estimates of the damage that would be caused by a solar flare equivalent to the Carrington Event. Factories, food supply, long-distance communication, digital currency - many critical services nowadays are dependent on compute, and that portion will only increase by the time you would actually pull the trigger.

Comment by Daphne_W on AGI Ruin: A List of Lethalities · 2022-06-06T09:20:32.942Z · LW · GW

Your method of trying to determine whether something is true or not relies overly much on feedback from strangers. Your comment demands large amounts of intellectual labor from others ('disprove why all easier modes are incorrect'), despite the preamble of the post, while seeming unwilling to put much work in yourself.

Comment by Daphne_W on AGI Ruin: A List of Lethalities · 2022-06-06T08:30:52.501Z · LW · GW

I think Yudkowsky would argue that on a scale from never learning anything to eliminating half your hypotheses per bit of novel sensory information, humans are pretty much at the bottom of the barrel.

When the AI needs to observe nature, it can rely on petabytes of publicly available datasets from particle physics to biochemistry to galactic surveys. It doesn't need any more experimental evidence to solve human physiology or build biological nanobots: we've already got quantum mechanics and human DNA sequences. The rest is just derivation of the consequences.

Sure, there are specific physical hypotheses that the AGI can't rule out because humanity hasn't gathered the evidence for them. But that, by definition, excludes anything that has ever observably affected humans. So yes, for anything that has existed since the inflationary period, the AGI will not be bottlenecked on physically gathering evidence.

I don't really get what you're pointing at with "how much AGI will be smarter than humans", so I can't really answer your last question. How much smarter than yourself would you say someone like Euler is than yourself? Is his ability to do scientific/mathematical breakthroughs proportional to your difference in smarts?

Comment by Daphne_W on AGI Ruin: A List of Lethalities · 2022-06-06T07:35:48.759Z · LW · GW
  • Solve protein folding problem
  • Acquire human DNA sample
  • Use superintelligence to construct a functional model of human biochemistry
  • Design a virus that exploits human biochemstry
  • Use one of the currently available biochemistry-as-a-service providers to produce a sample that incubates the virus and then escapes their safety procedures (e.g. pay someone to mix two vials sent to them in the mail. The aerosols from the mixing infect them)
Comment by Daphne_W on SERI ML Alignment Theory Scholars Program 2022 · 2022-05-28T16:13:11.741Z · LW · GW

Hey, it's now officially no longer May 27th anywhere, and I can't find any announcements yet. How's it going?

Edit: Just got my acceptance letter! See you all this summer!

Comment by Daphne_W on What DALL-E 2 can and cannot do · 2022-05-14T22:06:26.487Z · LW · GW

Sorry that automation is taking your craft. You're neither the first nor the last this will happen to. Orators, book illuminators, weavers, portrait artists, puppeteers, cartoon animators, etc. Even just in the artistic world, you're in fine company. Generally speaking, it's been good for society to free up labor for different pursuits while preserving production. The art can even be elevated as people incorporate the automata into their craft. It's a shame the original skill is lost, but if that kept us from innovating, there would be no way to get common people multiple books or multiple pictures of themselves or CGI movies. It seems fair to demand society have a way to support people whose jobs have been automated, at least until they can find something new to do. But don't get mad at the engine of progress and try to stop it - people will just cheer as it runs you over.

Comment by Daphne_W on How confident are we that there are no Extremely Obvious Aliens? · 2022-05-07T11:50:13.444Z · LW · GW

Before learning about reversible computation only requiring work when bits are deleted I would have treated each of my points as roughly independent with about 10^1.5 , 10^4 , 10^4 , 10^2.5 odds against respectively. The last point is now down to 10^1.5 .

Dumping waste information in the baryonic world would be visible.

Comment by Daphne_W on How confident are we that there are no Extremely Obvious Aliens? · 2022-05-07T11:47:27.097Z · LW · GW

#1 - Caution doesn't solve problems, it finds solutions if they exist. You can't use caution to ignore air resistance when building a rocket. (Though collapse is not necessarily expected - there's plenty of interstellar dust).

#4 - I didn't know about Landauer's principle, though going by what I'm reading, you're mistaken on its interpretation - it takes 'next to nothing' times the part of the computation you throw out, not the part you read out, where the part you throw out increases proportional to the negentropy you're getting. No free lunch, still, but one whose price is deferable to the moment you run out of storage space.

That would make it possible for dark matter to be part of a computation that hasn't been read out yet, though not necessarily a major part: I'm not sure the below reasoning is correct, but The Landauer limit with the current 2.7K universe as heat bath is 0.16 meV per bit. This means that the 'free' computational cycle you get from the fact that you only need to pay at the end would, to a maximally efficient builder, reward them with 0.16 meV extra for every piece of matter that can hold one bit. We don't yet have a lower bound for the neutrino mass, but the upper bound is 120 meV. If the upper bound is true, that would mean you would have to cram 10^3 bits in a neutrino before using it as storage nets you more than burning it for energy (by chucking it into an evaporating black hole).

I don't have data for #2 and #3 at hand. It's the scientific consensus, for what that's worth.

Comment by Daphne_W on How confident are we that there are no Extremely Obvious Aliens? · 2022-05-03T08:23:29.181Z · LW · GW

1:10^12 odds against the notion, easily. About as likely as the earth being flat.

  1. Dark matter does not interact locally with itself or visible matter. If it did, it would experience friction (like interstellar gas, dust and stars) and form into disk shapes when spiral galaxies form into disk shapes. A key observation of dark matter is that spiral galaxies' rotational velocity behaves as one would expect from an ellipsoid.
  2. The fraction of matter that is dark does not change over time, nor does the total mass of objects in the universe. Sky surveys do not find more visible matter further back in time.
  3. The fraction of matter that is dark does not change across space, even across distances that have not been bridgable since the inflation period of the big bang. All surveys show spherical symmetry.
  4. By the laws of thermodynamics, computation requires work. High-entropy energy needs to be converted into low-entropy energy, such as heat. We do not see dark matter absorb or emit energy.

I can imagine no situation where something that is a required part of computational processes could ever present itself to us as dark matter, and no mistake in physics thorough enough to allow it.

Comment by Daphne_W on How confident are we that there are no Extremely Obvious Aliens? · 2022-05-03T07:47:30.590Z · LW · GW

Unlike what you would expect with black holes, we can see that the Boötes void contains very little mass by looking for gravitational lensing and the movement of surrounding galaxies.

Comment by Daphne_W on How confident are we that there are no Extremely Obvious Aliens? · 2022-05-03T05:25:29.595Z · LW · GW

On the SLOAN webpage, there's a list of ongoing and completed surveys, some of which went out to z=3 (10 billion years ago/away), though the more distant ones didn't use stellar emissions as output. Here is a youtube video visualizing the data that eBOSS (a quasar study) added in 2020, but it shows it alongside visible/near-infrared galaxy data (blue to green datasets), which go up to about 6 billion years. Radial variations in density in the observed data can be explained by local obstructions (the galactic plane, gas clouds, nearby galaxies), while radially symmetric variations can be explained by different instruments' suitability to different timescales.

Just eyeballing it, it doesn't look like there are any spherical irregularities more than 0.5 billion light years across.

If you want to look more carefully, here are instructions for downloading the dataset or specific parts of it.

You should also note that Dyson spheres aren't just stars becoming invisible. Energy is conserved, so every star with a Dyson sphere around it emits the same amount of radiation as before, it's just shifted to a lower part of the spectrum. For example, a Dyson sphere located at 1 AU from the Sun would emit black body radiation at about 280 K. A Dyson sphere at 5 AU would be able to extract more negentropy at the cost of more material, and have a temperature of 12 K - low enough to show up on WMAP (especially once redshifted by distance).  I actually did my Bachelor thesis reworking some of the math on a paper that looked for circular irreglarities in the WMAP data and found none.

Comment by Daphne_W on Narrative Syncing · 2022-05-02T08:50:49.479Z · LW · GW

There definitely seem to be (relative) grunt work positions in AI safety, like this, this or this. Unless you think these are harmful, it seems like it would be better to direct the Alec-est Alecs of the world that way instead of risking them never contributing.

I understand not wanting to shoulder responsibility for their career personally, and I understand wanting an unbounded culture for those who thrive under those conditions, but I don't see the harm in having a parallel structure for those who do want/need guidance.

Comment by Daphne_W on Narrative Syncing · 2022-05-01T17:35:36.093Z · LW · GW

Well, it's better, but in I think you're still playing into [Alec taking things you say as orders], which I claim is a thing, so that in practice Alec will predictably systematically be less helpful and more harmful than if he weren't [taking things you say as orders].

There seems to be an assumption here that Alec would do something relatively helpful instead if he weren't taking the things you say as orders. I don't think this is always the case: for people who aren't used to thinking for themselves, the problem of directing your career to reduce AI risk is not a great testbed (high stakes, slow feedback), and without guidance they can just bounce off, get stuck with decision paralysis, or listen to people who don't have qualms about giving advice.

Like, imagine Alec gives you API access to his brain, with a slider that controls how much of his daily effort he spends not following orders/doing what he thinks is best . You may observe that his slider is set lower than most productive people in AI safety, but (1) it might not help him or others to crank it up and (2) if it is helpful to crank it up, that seems like a useful order to give.

Anna's Scenario 3 seems like a good way to self-consistently nudge the slider upwards over a longer period of time, as do most of your suggestions.

Comment by Daphne_W on On infinite ethics · 2022-04-14T07:40:08.110Z · LW · GW

This is where things go wrong. The actual credence of seeing a hypercomputer is zero, because a computationally bounded observer can never observe such an object in such a way that differentiates it from a finite approximation. As such, you should indeed have a zero percent probability of ever moving into a state in which you have performed such a verification, it is a logical impossibility. Think about what it would mean for you, a computationally bounded approximate bayesian, to come into a state of belief that you are in possession of a hypercomputer (and not a finite approximation of a hypercomputer, which is just a normal computer. Remember arbitrarily large numbers are still infinitely far away from infinity!). What evidence would you have to observe for this belief? You would need to observe literally infinite bits, and your credence to observing infinite bits should be zero, because you are computationally bounded! If you yourself are not a hypercomputer, you can never move into the state of believing a hypercomputer exists.

 

Sorry, I previously assigned hypercomputers a non-zero credence, and you're asking me to assign it zero credence. This requires an infinite amount of bits to update, which is impossible to collect in my computationally bounded state. Your case sounds sensible, but I literally can't receive enough evidence over the course of a lifetime to be convinced by it.

Like, intuitively, it doesn't feel literally impossible that humanity discovers a computationally unbounded process in our universe. If a convincing story is fed into my brain, with scientific consensus, personally verifying the math proof, concrete experiments indicating positive results, etc., I expect I would believe it. In my state of ignorance, I would not be surprised to find out there's a calculation which requires a computationally unbounded process to calculate but a bounded process to verify.

To actually intuitively give something 0 (or 1) credence, though, to be so confident in a thesis that you literally can't change your mind, that at the very least seems very weird. Self-referentially, I won't actually assign that situation 0 credence, but even if I'm very confident that 0 credence is correct, my actual credence will be bounded by my uncertainty in my method of calculating credence.

Comment by Daphne_W on [deleted post] 2022-04-13T13:46:15.487Z

That's not a middle ground between a good world and a neutral world, though, that's just another way to get a good world. If we assume a good world is exponentially unlikely, a 10 year delay might mean the odds of a good world rise from 10^-10 to 10^-8 (as opposed to pursuing Clippy bringing the odds of a bad world down from 10^-4 to 10^-6 ).

If you disagree with Yudkowsky about his pessimism about the probability of good worlds, then my post doesn't really apply. My post is about how to handle him being correct about the odds.

Comment by Daphne_W on [deleted post] 2022-04-12T16:46:47.804Z

That's a fair point - my model does assume AGI will come into existence in non-negative worlds. Though I struggle to actually imagine a non-negative world where humanity is alive a thousand years from now and AGI hasn't been developed. Even if all alignment researchers believed it was the right thing to pursue, which doesn't seem likely.

Comment by Daphne_W on [deleted post] 2022-04-12T07:50:36.176Z

Both that and Q5 seem important to me.

Q5 is an exploration of my uncertainty in spite of me not being able to find faults with Clippy's argument, as well as what I expect others' hesitance might be. If Clippy's argument is correct, then the section you highlight seems like the logical conclusion. 

Comment by Daphne_W on [deleted post] 2022-04-12T07:35:53.384Z

That's the gist of it.

Comment by Daphne_W on [deleted post] 2022-04-12T07:34:59.819Z

I'm not well-versed enough to offer something that would qualify as proof, but intuitively I would say "All problems with making a tiling bot robust are also found in aligning something with human values, but aligning something with human values comes with a host of additional problems, each of which takes additional effort". We can write a tiling bot for a grid world, but we can't write an entity that follows human values in a grid world. Tiling bots don't need to be complicated or clever, they might not even have to qualify as AGI - they just have to be capable of taking over the world.

All of that said, I strongly encourage the most possible caution with this post. Creating a "neutral" AGI is still a very evil act, even if it is the act with the highest expected utility.

 Q5 of Yudkowsky's post seems like an expert opinion that this sort of caution isn't productive. What I present here seems like a natural result of combining awareness of s-risk with the low probability of good futures that Yudkowsky asserts, so I don't think security from obscurity offers much protection. In the likely event that the evil thing is bad, it seems best to discuss it openly so that the error can be made plain for everyone and people don't get stuck believing it is the right thing to do or worrying that others believe it is the right thing to do. In the unlikely event that it is good, I don't want to waste time personally gathering enough evidence to become confident enough to act on it when others might have more evidence readily available.

Comment by Daphne_W on MIRI announces new "Death With Dignity" strategy · 2022-04-05T10:31:10.378Z · LW · GW

As fictional characters popular among humans, what attitude is present in them is evidence for what sort of attitude humans like to see or inhabit. As author of those characters, Yudkowsky should be aware of this mechanism. And empirically, people with accurate beliefs and positive attitudes outperform people with accurate beliefs and negative attitudes. It seems plausible Yudkowsky is aware of this as well.

"Death with dignity" reads as an unnecessarily negative attitude to accompany the near-certainty of doom. Heroism, maximum probability of catgirls, or even just raw log-odds-of-survival seem like they would be more motivating than dignity without sacrificing accuracy.

Like, just substitute all instances of 'dignity' in the OP with 'heroism' and naively I would expect this post to have a better impact(/be more dignified/be more heroic), except insofar it might give a less accurate impression of Yudkowsky's mood. But few people have actually engaged with him on that front.

Comment by Daphne_W on What an actually pessimistic containment strategy looks like · 2022-04-05T08:16:27.583Z · LW · GW

Are you aware of Effective Altruism's AI governance branch? I didn't look into it in detail myself, but there are definitely dozens of people already working on outreach strategies that they believe to be the most effective. FHI, CSER, AI-FAR, GovAI, and undoubtedly more groups have projects ongoing for outreach, political intervention, etc. with regards to AI Safety. If you want to spend your marginal time on stuff like this, contact them.

It does appear true that the lesswrong/rationalist community is less engaged with this strategy than might be wise, but I'm curious if those organisations would say if people currently working on technical alignment research should switch to governance/activism, and what their opinion is on activism. 80,000 hours places AI technical research above AI governance in their career impact stack, though personal fit plays a major part.

Comment by Daphne_W on Precognition · 2021-07-13T07:13:52.825Z · LW · GW

I was relying on your numbers that 1 microcovid = $0.01 at $10M per life, and that you believe lives should currently be valued at $10B. Have you changed your mind in either of these areas?

Going by your link, wearing a mask during that trip would be about 0.5 microcovid if you're vaccinated. Where I live, a disposable mask is about $0.25. I would also not accept $10 to wear a mask when I'm alone for an hour per day for a year, so my discomfort is at least $0.03 per hour. I also care about my appearance, status and about the ease of communication of people seeing my lips move, which even for a grocery trip amounts to several cents per hour cost for wearing a mask.

So, for me, wearing a mask while going grocery shopping in SF for an hour would cost upwards of $0.30. If you already have a rotating set of non-disposable masks so the material costs are negligible on the margin, I would still rate the discomfort and social effects at upwards of $0.05, which means you would have to value your life at at least $100M, or $600M when including disposable or one-time investment material costs. Again, there are lots of trades that should be more valuable than this, including ordering your groceries to be delivered to avoid traffic accidents.

Comment by Daphne_W on Precognition · 2021-07-02T10:53:10.201Z · LW · GW

You're acting like wearing a mask is the only trade you can do to prolong your life. But there are many investments/interventions into your own longevity that are more effective per dollar that you probably haven't taken yet.

Do you spend more than two hours per day doing exercise? Do you check your micronutrient blood levels regularly? Do you refuse to go outside in fear of traffic accidents or muggings or skin cancer? Have you moved to an area with little air pollutants and good access to medical/cryogenic emergency services? Have you read everything on the internet that has a better than 1:100 million chance per hour of reading of saving your life? Are you always accompanied by your personal full-time physician?

Before you get to $1 per 10^-10 chance to preserve your life for the singularity, there are a lot of interventions you can take, many of which are expensive or time-consuming. Opportunity cost ruins it for all but the richest of people.

And that's just regarding your own life. Other people have a chance of post-singularity immortality too, so rather than a small chance of saving yourself, you could go for a much larger chance at saving someone else. EA charities can manage about $1 per 10^-4 chance to preserve a life for the singularity, and investing in friendly singularity research is almost certainly even more valuable given your long-termist argument for self-preservation.

Comment by Daphne_W on What are all these children doing in my ponds? · 2021-04-04T06:09:49.625Z · LW · GW
  1. Real-world personal finance isn't much of a red queen race. It costs an almost fixed amount of money to stay alive each year, and an almost fixed amount of money to raise a child to adulthood. If you make over $100k a year, you can comfortably raise at least one child and spend any excess money or effort on charity.

  2. We're probably not in a timeframe where evolution will matter. There are several existential risks and several technologies that render evolution obsolete that are likely to occur before most lesswrong readers would normally die. Having children for evolutionary purposes doesn't seem like an effective strategy for promoting altruism.

  3. We're not at a stage of popularity where raising a child to hopefully agree with us to be charitable 25 years from now is a more viable memetic strategy than spending the years that would be spent child-rearing trying to coax people into taking off their earmuffs. Which in turn may be less efficient than just putting your nose to the grindstone and making a proof of concept child-saving robot to show off and use.

The camera crew and autobiography strategy is the one Peter Singer personally uses.