Posts

Security Mindset - Fire Alarms and Trigger Signatures 2023-02-09T21:15:59.129Z
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment 2022-06-21T23:55:39.918Z

Comments

Comment by elspood on Conjecture: Internal Infohazard Policy · 2022-09-12T00:51:45.485Z · LW · GW

This is a great draft and you have collated many core ideas. Thank you for doing this!

As a matter of practical implementation, I think it's a good idea to always have a draft of official, approved statements of capabilities that can be rehearsed by any individual who may find themselves in a situation where they need to discuss them. These statements can be thoroughly vetted for second- and higher-order information leakage ahead of time, instead of trying to evaluate in real-time what their statements might reveal. It can be counterproductive in many circumstances to only be able to say "I can't talk about that". It also gives people a framework to practice this skill ahead of time in a lower-stakes environment, and the more people who are already read in at a classification level have a chance to vet the statement, the better the chance of catching issues.

The downside of formalizing this process is that you end up with a repository of highly sensitive information, but it seems obvious that you want to practice with weapons and keep them in a safe, rather than just let everyone run around throwing punches with no training.

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-07-31T15:05:46.313Z · LW · GW

I'm glad you found it useful, even in this form. If the thing you're working on is something you could share, I'd be happy to offer further assistance, if you like.

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-27T21:28:09.146Z · LW · GW

Obviously this can't be answered with justice in a single comment, but here are some broad pointers that might help see the shape of the solution:

  • Israeli airport security focuses on behavioral cues, asking unpredictable questions, and profiling. A somewhat extreme threat model there, with much different base rates to account for (but also much lower traffic volume).
  • Reinforced cockpit doors address the hijackers with guns and knives scenarios, but are a fully general kind of a no-brainer control.
  • Good policework and better coordination in law enforcement are commonly cited, e.g. in the context of 9/11 hijackings, before anyone even gets to an airport.

In general, if the airlines had responsibility for security you would see a very different set of controls than what you get today, where it is an externality run by an organization with very strong "don't do anything you can get blamed for" political incentives. In an ideal world, you could get an airline catering to paranoiacs who wanted themselves and their fellow passengers to undergo extreme screening, one for people who have done the math, and then most airlines in the middle would phase into nominal gate screening procedures that didn't make them look to their customers that they didn't care (which largely the math says that they shouldn't).

A thought experiment: why is there no equivalent bus/train station security to what we have at airports? And what are the outcomes there?

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-23T15:55:32.807Z · LW · GW

I appreciate the nudge here to put some of this into action. I hear alarm bells when thinking about formalizing a centralized location for AI safety proposals and information about how they break, but my rough intuition is that if there is a way these can be scrubbed of descriptions of capabilities which could be used irresponsibly to bootstrap AGI, then this is a net positive. At the very least, we should be scrambling to discuss safety controls for already public ML paradigms, in case any of these are just one key insight or a few teraflops away from being world-ending.

I would like to hear from others about this topic, though; I'm very wary of being at fault for accelerating the doom of humanity.

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-23T15:47:54.650Z · LW · GW

My project seems to have expired from the OWASP site, but here is an interactive version that should have most of the data:

https://periodictable.github.io/

You'll need to mouse over the elements to see the details, so not really mobile friendly, sorry.

I agree that linters are a weak form of automatic verification that are actually quite valuable. You can get a lot of mileage out of simply blacklisting unsafe APIs and a little out of clever pattern matching.

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-23T02:00:05.036Z · LW · GW

I would say that some formal proofs are actually impossible, but would agree that software with many (or even all) of the security properties we want could actually have formal-proof guarantees. I could even see a path to many of these proofs today.

While the intent of my post was to draw parallel lessons from software security, I actually think alignment is an oblique or orthogonal problem in many ways. I could imagine timelines in which alignment gets 'solved' before software security. In fact, I think survival timelines might even require anyone who might be working on classes of software reliability that don't relate to alignment to actually switch their focus to alignment at this point.

Software security is important, but I don't think it's on the critical path to survival unless somehow it is a key defense against takeoff. Certainly many imagined takeoff scenarios are made easier if an AI can exploit available computing, but I think the ability to exploit physics would grant more than enough escape potential.

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-22T19:07:57.279Z · LW · GW

The halting problem only makes it impossible to write a program that can analyze a piece of code and then reliably say "this is secure" or "this is insecure".

It would be nice to able to have this important impossible thing. :)

I think we are trying to say the same thing, though. Do you agree with this more concise assertion?

"It's not possible to make a high confidence checker system that can analyze an arbitrary specification, but it is probably possible (although very hard) to design systems that can be programmatically checked for the important qualities of alignment that we want, if such qualities can also be formally defined."

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-22T16:47:25.976Z · LW · GW

I would agree that some people figured this out faster than others, but the analogy is also instructional here: if even a small community like the infosec world has a hard time percolating information about failure modes and how to address them, we should expect the average ML engineer to be doing very unsafe things for a very long time by default.

To dive deeper into the XSS example, I think even among those that understood the output encoding and canonicalization solutions early, it still took a while to formalize the definition of an encoding context concisely enough to be able to have confidence that all such edge cases could be covered.

It might be enough to simply recognize an area of alignment that has dragons and let the experts safely explore the nature and contours of these dragons, but you probably couldn't build a useful web application that doesn't display user-influencable input. I think trying to get the industry to halt on building even obvious dragon-infested things is part of what has gotten Eliezer so burned out and pessimistic.

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-22T16:33:56.991Z · LW · GW

I think you make good points generally about status motives and obstacles for breakers. As counterpoints, I would offer:

  • Eliezer is a good example of someone who built a lot of status on the back of "breaking" others' unworkable alignment strategies. I found the AI Box experiments especially enlightening in my early days.
  • There are lots of high-status breakers, and lots of independent status-rewarding communities around the security world. Some of these are whitehat/ethical, like leaderboards for various bug bounty programs, OWASP, etc. Some of them not so much so, like Blackhat/DEFCON in the early days, criminal enterprises, etc.

Perhaps here is another opportunity to learn lessons from the security community about what makes a good reward system for the breaker mentality. My personal feeling is that poking holes in alignment strategies is easier than coming up with good ones, but I'm also aware that thinking that breaking is easy is probably committing some quantity of typical mind fallacy. Thinking about how things break, or how to break them intentionally, is probably a skill that needs a lot more training in alignment. Or at least we need away to attract skilled breakers to alignment problems.

I find it to be a very natural fit to post bounties on various alignment proposals to attract breakers to them. Keep upping the bounty, and eventually you have a quite strong signal that a proposal might be workable. I notice your experience of offering a personal bounty does not support this, but I think there is a qualitative difference between a bounty leaderboard with public recognition and a large pipeline of value that can be harvested by a community of good breakers, and what may appear to be a one-off deal offered by a single individual with unclear ancillary status rewards.

It may be viable to simply partner with existing crowdsourced bounty program providers (e.g. BugCrowd) to offer a new category of bounty. Traditionally, these services have focused on traditional "pen-test" type bounties, doing runtime testing of existing live applications. But I've long been saying there should be a market for crowdsourced static analysis, and even design reviews, with a pay-per-flaw model.

Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-22T06:25:21.632Z · LW · GW

Many! Thanks for sharing. This could easily turn into its own post.

In general, I think this is a great idea. I'm somewhat skeptical that this format would generate deep insights; in my experience successful Capture the Flag / wargames / tabletop exercises work best in the form where each group spends a lot of time preparing for their particular role, but opsec wargames are usually easier to score, so the judge role makes less sense there. That said, in the alignment world I'm generally supportive of trying as many different approaches as possible to see what works best.

Prior to reading your post, my general thoughts about how these kind of adversarial exercises relate to the alignment world were these:

  • The industry thought leaders usually have experience as both builders and breakers; some insights are hard to gain from just one side of the battlefield. That said, the industry benefits from folks who spend the time becoming highly specialized in one role or the other, and the breaker role should be valued at least equally, if not more than the builder. (In the case of alignment, breakers may be the only source of failure data we can safely get.)
  • The most valuable tabletop exercises that I was a part of spent at least as much time analyzing the learnings as the exercise itself; almost everyone involved will have unique insights that aren't noticed by others. (Perhaps this points to the idea of having multiple 'judges' in an alignment tournament.)
  • Non-experts often have insights or perspectives that are surprising to security professionals; I've been able to improve an incident response process based on participation from other teams (HR, legal, etc.) almost every time I've run a tabletop. This is probably less true for an alignment war game, because the background knowledge required to even understand most alignment topics is so vast and specialized.
  • Unknown unknowns are a hard problem. While I think we are a long way away from having builder ideas that aren't easily broken, it's going to be a significant danger to have breakers run out of exploit ideas and mistake that for a win for the builders.
  • Most tabletop exercises are focused on realtime response to threats. Builder/breaker war games like the DEFCON CTF are also realtime. It might be a challenge to create a similarly engaging format that allows for longer deliberation times on these harder problems, but it's probably a worthwhile one.
Comment by elspood on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-22T01:51:22.918Z · LW · GW

I definitely wouldn't rule out the possibility of being able to formally define a set of tests that would satisfy our demands for alignment. The most I could say with certainty is that it's a lot harder than eliminating software security bug classes. But I also wouldn't rule out the possibility that an optimizing process of arbitrarily strong capability simply could not be aligned, at least to a level of assurance that a human could comprehend.

Thank you for these additional references; I was trying to anchor this article with some very high-level concepts. I very much expect that to succeed we're going to have to invent and test hundreds of formalisms to be able to achieve any kind of confidence about the alignment of a system.

Comment by elspood on What on Earth is a Series I savings bond? · 2021-12-20T01:06:40.650Z · LW · GW

I got part of the way through the process and then got stuck, but my situation may not be typical.

  • These bonds have to be purchased directly from the treasury, with an account at Treasury Direct.
  • Creation of a Treasury Direct account requires mailing in a form that has to be certified by a specific bank certifying agent in the US. A regular notary service is not accepted.
  • As far as I can tell, an equivalent certification service isn't available outside the country.
Comment by elspood on RandomWalkNFT: A Game Theory Exercise · 2021-11-12T22:25:48.852Z · LW · GW

I shudder to imagine the mutual funds created to fund bids on this thing.

How hard do you have squint to not see this thing as pyramid-shaped? This thing is like Sierpinski's pyramid. It's fractally a scam; a scam at every conceivable resolution.

Comment by elspood on RandomWalkNFT: A Game Theory Exercise · 2021-11-12T21:51:05.184Z · LW · GW

Actually, the worst thing would be if the price of the minting increases at a rate slower than the value of half the pool grows. Then every next bid would still be "in the money", and then whoever doesn't go bankrupt first wins. This thing could eat the whole world. Terrible. Kill it with fire.

Comment by elspood on RandomWalkNFT: A Game Theory Exercise · 2021-11-12T21:47:12.425Z · LW · GW

Well maybe I'm missing something, but the game theory doesn't seem that interesting to me. And calling it a 'return on investment' seems a bit generous for what is really just a game of blockchain chicken. In fact, it might be as crazy as a dollar auction where people might end up bidding more than what half the accumulated contract is worth due to sunk cost fallacy or other irrational behaviors.

Either way, you're not really buying anything of value here: you're just betting that the auction gets so little attention that you can walk away with free money, or else you're financing someone else's eventual bad decisionmaking (possibly your own).

You say "which in theory should increase the value of every minted NFT so far", but I don't see how. What additional value is added to a previously purchased NFT by someone purchasing the next one? If anything, each incremental minting makes the previous one worthless (unless you ascribe some inherent value to owning an arbitrary NFT). In fact, every NFT in this game is a badge of shame except the one that that wins the pot, and even that one could be shameful if it cost more than the pot was worth.

It seems especially insidious that the game perpetuates itself by trying to get people to restart the bidding war again with the other half of the prize pool. You didn't say whether the price of minting resets after half the pool gets claimed, but either way it's terrible: either the price resets and make the next bidding war even more furious and ties up even more people/funds, or it doesn't and makes it more likely that the other half just sits there unclaimed forever because it costs more to mint the bid than you'd see in return, but I don't see how this ever ends well for anyone.

On the other hand, one could view this game as punishing greed at a meta level: at first it looks like you get a free 242x return, but at best you realize you've only thrown away $500; at worst you end up much deeper in the hole you tried to dig your way out of. Not sure I approve of the ethics of this punishment, though.

Anyway, it seems clear to me that the correct strategy is to not mint the next NFT. Get some popcorn and watch; the entertainment value to others is the only real value in RandomWalk.

Comment by elspood on RandomWalkNFT: A Game Theory Exercise · 2021-11-12T20:33:57.130Z · LW · GW

What happens to the other half? This seems underspecified as you've described it.

Comment by elspood on 2021 Darwin Game - Everywhere Else · 2021-10-07T01:48:29.879Z · LW · GW

In the interest of science, I ran 10 more simulations with our submitted population. This is not to open a can of worms or to challenge the results in any way - we all knew we had to win on the first try!

https://drive.google.com/file/d/1mSqaNlo5KT9l9vmY3ckd8KSTXA0xOz0u/view

Some things that I observed:

  • The results were highly sensitive to randomness. Almost no species survived consistently.
  • Sometimes defenseless creatures survived and sometimes they didn't.
  • LeavyTanky (ViktorThink) survived basically every time. Looks like there was no competition for the invincible leaf eater niche in the Rainforest (though plenty of leaf eaters abounded). I would say this is the strongest creature in the field of submissions based on my tests.
  • Usually an apex predator survived (10 attack, 10 speed). Often it was the most successful creature in terms of total energy across all biomes that it spread to. I was usually seeing antivenom in an apex predator not being worth it, but the Cheetah had it and did well in several runs.
  • Venomous creatures almost never survived.
  • As a class, armored tanks were the majority of survivors. Occasionally a speeder would survive, but much less commonly.
  • Usually, some mid-range tanks survived as well (~6 armor). This was often enough to stay ahead of predators while outcompeting invincible tanks.
  • On average only about 15 species survived past generation 1000. 30 species NEVER survived this long together. If you combine species occupying the same niche, this number was barely more than 10.
  • The tundra was always barren. The desert was always taken over by a single species.
  • I was surprised to see the Dump omnivores survive many times (Garbage Disposal, and 2-8-0 algae-...). Creatures with more than a few food sources generally didn't do well, but the formula seemed decent in the Dump.
  • Sometimes the coconuts got eaten! Not often, though.
  • Often a 1 attack, 1 speed omnivore survived. Usually these took the place of defenseless creatures, but in one case they coexisted.

It might be fun to compete to design the creature that does the best against the 555-species field. I might also do some more experiments/analysis when I have some time - let me know if there's anything you're curious about.

Congrats to all the winners! Already looking forward to next year. Thanks lsusr for running this again this year!

Comment by elspood on 2021 Darwin Game - Everywhere Else · 2021-10-07T00:26:29.394Z · LW · GW

Here's our Brier scores for our predictions:

https://docs.google.com/spreadsheets/d/1qhuACrtD0esgCqz8rQvYcZOC0I1y1l66/edit#gid=225287990

The defenseless creature result really surprised most of us. Well done, aphyer, you knew what was up.

Comment by elspood on 2021 Darwin Game - Human Garbage Dump · 2021-10-06T14:47:12.905Z · LW · GW

Of all the things, the coconuts were by far the most difficult to get anything to survive on. In my simulations, usually the coconut eaters that survived were also eating something else.

In theory, coconuts should sustain a 13.1 E creature; In practice, with such a small food source this size creature gets outcompeted at first by much smaller organisms that then get hunted to extinction by predators.

Comment by elspood on 2021 Darwin Game - River · 2021-10-06T14:27:51.631Z · LW · GW

Ah, I read the wrong line. So yeah, we submitted the exact same creature.

There were definitely reliably BAD creatures, and certainly some reliably good ones, but a lot of variance based on the overall makeup of the population. I certainly didn't expect so many total creatures to be submitted; there was a lot more variability in results with 500-creature populations. In 5000-creature populations, basically the only thing that ever survived was invincibles.

With this size population, I don't think it's a coincidence that your minimal invincible survived - and certainly wasn't just luck that you arrived at its design. Give yourself SOME credit. :)

Comment by elspood on 2021 Darwin Game - River · 2021-10-06T12:56:38.110Z · LW · GW

I submitted the exact same 10 speed leaf eater that you did, I just started it in the Temperate Forest. Luck of the draw that yours got here first, I guess.

Comment by elspood on 2021 Darwin Game - Human Garbage Dump · 2021-10-06T12:50:26.391Z · LW · GW

Damn, now I'm upset I didn't spend more time thinking of a good name. A brown bear isn't even a pure predator! Really wish I had called THIS one the Trash Panda, instead. :)

Comment by elspood on 2021 Darwin Game - Tundra · 2021-10-06T06:18:37.533Z · LW · GW

Wait, are you initializing and running each biome separately? I expected all biomes to be seeded at once with the complete set of submitted organisms.

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-10-01T13:16:28.894Z · LW · GW

My definition of "minimal invincibles" here:

0 ATK, 10 DEF, 1SPD, Antivenom herbivore

OR

0 ATK, 0 DEF, 10SPD herbivore

These definitely win in a field of hundreds of participants. In my simulations, they were outcompeted by "less" invincible creatures fitting the invincible prototypes with 20-50 participants (200-500 creatures). I hedged my bets with a few invincibles, some hard-to-kills, and some things I found surprisingly hard to kill.

Also, my daughter's creature, so she has a chance to embarrass us all. :)

Did anyone find a way to reliably crash the populations of non-invincibles with fewer than 200 creatures (a reasonable amount of confederates you could wrangle)?

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-29T22:14:13.472Z · LW · GW

Embarrassing story:

I spent a lot of time writing a fast simulator and testing all kinds of approaches. Today I let my daughter (8) design a species without really understanding the game mechanics...and it performed better than every other creature on the first try. Granted, I had to help her correct some obviously suboptimal choices, but still...let's just say my confidence is not high.

I'll precommit to suggesting a secondary scoring mechanism for bragging rights: not simply the highest total number of surviving organisms but the total energy of the organisms (population * base energy).

Good luck everyone!

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-29T11:28:36.548Z · LW · GW

Can you give a more specific deadline? What timezone?

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-25T21:29:45.682Z · LW · GW

It would also be kind of a pain in the ass to change! :)

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-25T20:45:22.005Z · LW · GW

Not what I'm seeing. Roamers start roaming before the encounters in each biome, then after every biome is processed, the roamers find a new home. So the roamers go a whole generation without competing or foraging. Is that not what was intended?

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-25T13:50:14.776Z · LW · GW

I thought the same thing at first, but I think if the interact method is called with only one argument, then that creature ends up foraging normally. Since spawning depends on creature size and reproduction depends on energy, it seems equally likely that each biome will have an even number of creatures after each generation as they would odd. So this situation would happen whether roaming is occurring or not.

The tough situation is for carnivores; if they're the odd one out, they'll die, even if there are species that they could eat.

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-25T10:46:46.805Z · LW · GW

There is no initial check to see if a species can survive in its spawning biome. Obviously this doesn't matter for breathing, but species could live in the desert or tundra for free without the corresponding traits.

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-25T10:25:23.125Z · LW · GW

Ah, ok. So instead of competing in that generation, the individual roams.

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-25T03:56:57.033Z · LW · GW

If my understanding of the code is correct, if the organism successfully roams, it basically spawns another copy of itself, leaving the original behind to compete in the source biome . That organism isn't removed from the competition pool. Given the relatively low roaming rate, I'm not sure this makes a huge difference, but it doesn't seem like it should be intended behavior.

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-24T22:32:30.937Z · LW · GW

Can you elaborate on the winning condition? I expect most biomes will have surviving species; will that mean multiple winners, or will the ultimate winner be the species with the most total biomass? How long will the simulation be run? I can imagine stable equilibrium conditions with multiple survivors, even after an arbitrarily large number of simulation rounds.

Comment by elspood on The 2021 Less Wrong Darwin Game · 2021-09-24T12:36:38.514Z · LW · GW

Spelling: *dEtritus

Comment by elspood on Rationality Quotes Thread May 2015 · 2015-05-14T00:24:04.909Z · LW · GW

Reading this reply, I was immediately reminded of a situation described by Jen Peeples, I think in an episode of The Atheist Experience, about her co-pilot's reaction of prayer during a life-threatening helicopter incident. ( This Comment is all I could find as reference. )

Unless your particular prayer technique is useful for quickly addressing emergency situations, you probably don't want to be in the habit of relying on it as a general practice. I think the "rubber duck" Socratic approach could still be useful, so this isn't a disagreement with your entire comment, just a warning about possible failure modes.

Comment by elspood on Interpersonal Entanglement · 2015-02-19T21:26:31.797Z · LW · GW

Isn't there a separate axis for every aspect of human divergence? Maybe this was already explicit in asking if there is anything more complicated that romance for "multiplayer" relationships, but really this problem seems fully general: politics, or religion, or food, or any other preference that has a distribution among humans could be a candidate for creating schism (or indeed all axes at once). "Catgirl for romance" is one very specific failure mode, but the general one could be called "an echo chamber for every mind".

The expected result (for a mind that knows the genesis of the catpeople) is that eventually the catpersons will get boring, but Fun Theory still ought to allow for exploration of that territory as long as it allows a safe path of retreat back into the world of other minds. The important thing here seems to be that we must never be allowed to have catpeople without knowing their true nature (which seems to be a form of wireheading).

Comment by elspood on Harry Potter and the Methods of Rationality discussion thread, part 25, chapter 96 · 2013-08-11T18:56:47.707Z · LW · GW

It was hard to muster a proper sense of indignation when you were confronting the same dignified witch who, twelve years and four months earlier, had given both of you two weeks' detention after catching you in the act of conceiving Tracey.

Given the fact that there is a Tracey, then that act of conception must have completed. So, either McGonagall caught them at exactly the right moment, or the Davises had just kept on going after they were caught...

No matter how it happened, this scene must have played out hilariously.

Comment by elspood on Philosophical Landmines · 2013-02-11T19:43:16.135Z · LW · GW

If consequentialism and deontology shared a common set of performance metrics, they would not be different value systems in the first place.

At least one performance metric that allows for the two systems to be different is: "How difficult is the value system for humans to implement?"

Comment by elspood on Pinpointing Utility · 2013-02-03T21:51:31.564Z · LW · GW

[edited out emotional commentary/snark]

  1. If you can't multiply B by a probability factor, then it's meaningless in the context of xB + (1-x)C, also. xB by itself isn't meaningless; it roughly means "the expected utility on a normalized scale between the utility of the outcome I least prefer and the outcome I most prefer". nyan_sandwich even agrees that 0 and 1 aren't magic numbers, they're just rescaled utility values.
  2. I'm 99% confident that that's not what nyan_sandwich means by radiation poisoning in the original post, considering the fact that comparing utilities to 0 and 1 is exactly what he does in the hell example. If you're not allowed to compare utilities by magnitude, then you can't obtain an expected utility by multiplying by a probability distribution. Show the math if you think you can prove otherwise.

It's getting hard to reference back to the original post because it keeps changing with no annotations to highlight the edits, but I think the only useful argument in the radiation poisoning section is: "don't use units of sandwiches, whales, or orgasms because you'll get confused by trying to experience them". However, I don't see any good argument for not even using Utils as a unit for a single person's preferences. In fact, using units of Awesomes seems to me even worse than Utils, because it's easier to accidentally experience an Awesome than a Util. Converting from Utils to unitless measurement may avoid some infinitesimal amount of radiation poisoning, but it's no magic bullet for anything.

Comment by elspood on Pinpointing Utility · 2013-02-02T20:24:02.571Z · LW · GW

I think what you mean to tell me is: "say 'my preferences' instead of 'my utility function'". I acknowledge that I was incorrectly using these interchangeably.

I do think it was clear what I meant when I called it "my" function and talked about it not conforming to VNM rules, so this response felt tautological to me.

Comment by elspood on Pinpointing Utility · 2013-02-02T19:54:28.678Z · LW · GW

I notice we're not understanding each other, but I don't know why. Let's step back a bit. What problem is "radiation poisoning for looking at magnitude of utility" supposed to be solving?

We're not talking about adding N to both sides of a comparison. We're talking about taking a relation where we are only allowed to know that A < B, multiplying B by some probability factor, and then trying to make some judgment about the new relationship between A and xB. The rule against looking at magnitudes prevents that. So we can't give an answer to the question: "Is the sandwich day better than the expected value of 1/400 chance of a whale day?"

If we're allowed to compare A to xB, then we have to do that before the magnitude rule goes into effect. I don't see how this model is supposed to account for that.

Comment by elspood on Pinpointing Utility · 2013-02-02T19:24:36.606Z · LW · GW

It's too late for me. It might work to tell the average person to use "awesomeness" as their black box for moral reasoning as long as they never ever look inside it. Unfortunately, all of us have now looked, and so whatever value it had as a black box has disappeared.

You can't tell me now to go back and revert to my original version of awesome unless you have a supply of blue pills whenever I need them.

If the power of this tool evaporates as soon as you start investigating it, that strikes me as a rather strong point of evidence against it. It was fun while it lasted, though.

Comment by elspood on Pinpointing Utility · 2013-02-02T19:07:06.779Z · LW · GW

Ooops, you tried to feel a utility. Go directly to type theory hell; do not pass go, do not collect 200 utils.

I don't think this example is evidence against trying to 'feel' a utility. You didn't account for scope insensitivity and the qualitative difference between the two things you think you're comparing.

You need to compare the feeling of the turtle thrown against the wall to the cumulative feeling when you think about EACH individual beheading, shooting, orphaned child, open grave, and every other atrocity of the genocide. Thinking about the vague concept "genocide" doesn't use the same part of your brain as thinking about the turtle incident.

Comment by elspood on Pinpointing Utility · 2013-02-02T08:43:13.852Z · LW · GW

What I mean by "normalized" is that you're compressing the utility values into the range between 0 and 1. I am not aware of another definition that would apply here.

Your rule says you're allowed to compare, but your other rule says you're not allowed to compare by magnitude. You were serious enough about this second rule to equate it with radiation death.

You can't apply probabilities to utilities and be left with anything meaningful unless you're allowed to compare by magnitude. This is a fatal contradiction in your thesis. Using your own example, you assign a value of 1 to whaling and 1/500 to the sandwich. If you're not allowed to compare the two using their magnitude, then you can't compare the utility of 1/400 chance of the whale day with the sandwich, because you're not allowed to think about how much better it is to be a whale.

Comment by elspood on Pinpointing Utility · 2013-02-02T08:20:16.053Z · LW · GW

No, I mean if my utility function violates transitivity or other axioms of VNM, I more want to fix it than to throw out VNM as being invalid.

Comment by elspood on Pinpointing Utility · 2013-02-02T03:18:29.528Z · LW · GW

I think I have updated slightly in the direction of requiring my utility function to conform to VNM and away from being inclined to throw it out if my preferences aren't consistent. This is probably mostly due to smart people being asked to give an example of a circular preference and my not finding any answer compelling.

Expectation. VNM isn't really useful without uncertainty. Without uncertainty, transitive preferences are enough.

I think I see the point you're trying to make, which is that we want to have a normalized scale of utility to apply probability to. This directly contradicts the prohibition against "looking at the sign or magnitude". You are comparing 1/400 EU and 1/500 EU using their magnitudes, and jumping headfirst into the radiation. Am I missing something?

Comment by elspood on Pinpointing Utility · 2013-02-01T23:05:46.766Z · LW · GW

That was one of the major points. Do not play with naked utilities. For any decision, find the 0 anchor and the 1 anchor, and rank other stuff relative to them.

I understood your major point about the radioactivity of the single real number for each utility, but I got confused by what you intended the process to look like with your hell example. I think you need to be a little more explicit about your algorithm when you say "find the 0 anchor and the 1 anchor". I defaulted to a generic idea of moral intuition about best and worst, then only made it as far as thinking it required naked utilities to find the anchors in the first place. Is your process something like: "compare each option against the next until you find the worst and best?"

It is becoming clear from this and other comments that you consider at least the transitivity property of VNM to be axiomatic. Without it, you couldn't find what is your best option if the only operation you're allowed to do is compare one option against another. If VNM is required, it seems sort of hard to throw it out after the fact if it causes too much trouble.

What is the point of ranking other stuff relative to the 0 and 1 anchor if you already know the 1 anchor is your optimal choice? Am I misunderstanding the meaning of the 0 and 1 anchor, and it's possible to go less than 0 or greater than 1?

Comment by elspood on Pinpointing Utility · 2013-02-01T19:58:19.868Z · LW · GW

"Awesomeness" is IMO the simplest effective pointer to morality that we currently have, but that morality is still inconsistent and dynamic.

The more I think about "awesomeness" as a proxy for moral reasoning, the less awesome it becomes and the more like the original painful exercise of rationality it looks.

Comment by elspood on Pinpointing Utility · 2013-02-01T19:39:50.769Z · LW · GW

I've been very entertained by this framing of the problem - very fun to read!

I find it strange that you claim the date with Satan is clearly the best option, but almost in the same breath say that the utility of whaling in the lake of fire is only 0.1% worse. It sounds like your definition of clarity is a little bit different from mine.

On the Satan date, souls are tortured, steered toward destruction, and tossed in a lake of fire. You are indifferent to those outcomes because they would have happened anyway (we can grant this a premise of the scenario). But I very much doubt you are indifferent to your role in those outcomes. I assume that you negatively value having participated in torture, damnation, and watching others suffer, but it's not immediately clear if you had already done those things on the previous 78044 days.

Are you taking into account duration neglect? If so, is the pain of rape only slightly worse than burning in fire?

This probably sounds nitpicky; the point I'm trying to make is that computing utilities using the human brain has all kinds of strange artifacts that you probably can't gloss over by saying "first calculate the utility of all outcomes as a number then compare all your numbers on relative scale". We're just not built to compute naked utilities without reference anchors, and there does not appear to be a single reference anchor to which all outcomes can be compared.

Your system seems straightforward when only 2 or 3 options are in play, but how do you compare even 10 options? 100? 1000? In the process you probably do uncover examples of your preferences that will cause you to realize you are not VNM-compliant, but what rule system do you replace it with? Or is VNM correct and the procedure is to resolve the conflict with your own broken utility function somehow?

TL;DR: I think axiom #1 (utility can be represented as a single real number) is false for human hardware, especially when paired with #5.

Comment by elspood on Rationality Quotes January 2013 · 2013-01-29T01:02:45.431Z · LW · GW

Edited, thanks for the style correction.

I suspect you're probably right that more examples makes this more interesting, given the lack of upvotes. In fact, I probably found the quote relevant mostly because it more or less summed up the experience of my OWN life at the time I read it years ago.

I spent much of my youth being contrarian for contradiction's sake, and thinking myself to be revolutionary or somehow different from those who just joined the cliques and conformed, or blindly followed their parents, or any other authority.

When I realized that defining myself against social norms, or my parents, or society was really fundamentally no different from blind conformity, only then was I free to figure out who I really was and wanted to be. Probably related: this quote.