Pattern's Shortform Feed 2019-05-30T21:21:23.726Z · score: 13 (3 votes)
[Accidental Post.] 2018-09-13T20:41:17.282Z · score: -6 (1 votes)


Comment by pattern on What are information hazards? · 2020-02-19T04:36:46.589Z · score: 2 (1 votes) · LW · GW
Data hazard: Specific data, such as the genetic sequence of a lethal pathogen or a blueprint for making a thermonuclear weapon, if disseminated, create risk.

Dangerous blue prints. A generalization might include 'stability'.

It's interesting how it relates to false information. A failed implementation of 'true' nuclear reactor blue prints could also be dangerous (depending on the design). Some designs could have more risk than others based on how likely people handling it are to fail at making a safe reactor. (Dangerous Wooden Catapult - Plans not safe for children.)

Idea hazard: A general idea, if disseminated, creates a risk, even without a data-rich detailed specification.

This one may make more sense absent the true criteria - telling true from false need not be trivial. (How would people who aren't nuclear specialists tell that a design for a nuke is flawed?) The difference between true and false w.r.t possibly self-fulfilling prophecies isn't clear.

Also antibiotics might be useful for creating antibiotic resistant bacteria. (Not sure if such bacteria are more deadly to humans all else equal - this makes categorization difficult, how can an inventor tell if their invention can be used for ill?)

Sometimes the mere demonstration that something (such as a nuclear bomb) is possible provides valuable information which can increase the likelihood that some agent will successfully set out to replicate the achievement.

This is a useful concept in its own right.

Attention hazard: mere drawing of attention to some particularly potent or relevant ideas or data increases risk, even when these ideas or data are already “known”.

Also could be a risk in reverse - hiding evidence of a catastrophe could hinder its prevention/counter measures being developed (in time).

A mild/'non-hazardous' form might be making methods of paying attention to a thing less valuable, or bringing attention to things which if followed turn out to be dead ends.

(Exactly when and how to attend to and reduce potential information hazards is beyond the scope of this post; Convergence hopes to explore that topic later.)

I look forward to this work.

the principle of differential progress

from the linked post:

What we do have the power to affect (to what extent depends on how we define “we”) is the rate of development of various technologies and potentially the sequence in which feasible technologies are developed and implemented. Our focus should be on what I want to call differential technological development: trying to retard the implementation of dangerous technologies and accelerate implementation of beneficial technologies, especially those that ameliorate the hazards posed by other technologies.

An idea that seems as good and obvious as utilitarianism. But what if these things come in cycles? Technology A may be both positive and negative, but technology B which negates its harms is based on A. Slowing down tech development seems good before A arrives, but bad after. (This scenario implicitly requires that the poison has to be invented before the cure.)

[Thus, it is also possible to have a] Spoiler hazard: Fun that depends on ignorance and suspense is at risk of being destroyed by premature disclosure of truth.

Words like 'hazard' or 'risk' seem too extreme in this context. The effect can also be reversed - the knowledge that learning physics could enable you to reach the moon might serve to make the subject more, rather than less interesting. (The key point here is that people vary, which could be important to 'infohazards in general'. Perhaps some people acquiring the blueprints for a nuclear reactor wouldn't be dangerous because they wouldn't use them. Someone with the right knowledge (or in the right time and place) might be able to do more good with these blueprints, or even have less risk of harm; "I didn't think of doing that, but I see how it'd make the reactor safer.")

Terminology questions:

What is a minor hazard? (Info-paper cut doesn't sound right.)

What is the opposite of a hazard? (Info safeguard or shield sounds like it could refer to something that shields from info-hazards.)

As noted, an information hazard is “A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm”

The opposite being a "noble lie".

E.g., writing a paper on a plausibly dangerous technology can be an information hazard even if it turns out to be safe after all.

This seems confused. It seems the relevant map-territory distinction here is "something could be an information hazard, without us knowing that it is."

The concept of information hazards relates to risks of harm from creating or spreading true information (not from creating or spreading false information).

By definition only - the hazards of information need not obey this constraint.

The concept is definitely very useful in relation to existential risks and risks from technological development, but can also apply in a wide range of other contexts, and at much smaller scales.

What is an infohazard seems relative. If information about how to increase health can also be used to negatively impact it, then whether or not something is an infohazard seems to be based on the audience - are they benign or malign?

Some information hazards risk harm only to the knower of the true information themselves, and as a direct result of them knowing the information. But many information hazards harm other people, or harm in other ways.

Can idea A be harmful to those that don't carry it? Can two ideas x and y exist such that both are true, but holders of one idea may be dangerous to holders of the other? Are there non-trivial cases where ideas q, r, and, s are infohazards only as a whole? (Trivial case might be like 3 parts to knowing how build a nuke.)

Can a set of ideas together in part be an infohazard, but be harmless as a whole?

Information hazards are risks of harm, not necessarily guaranteed harms.

Information...harms? weapons? (Weapons are a guaranteed source of harm.) Thorns?

Comment by pattern on Training Regime Day 3: Tips and Tricks · 2020-02-19T00:36:35.262Z · score: 3 (2 votes) · LW · GW
(There's some relation to the sunk cost fallacy here, in the sense that theoretically you should search equally hard for understanding after you've already paid no matter how much it costed. However, human brains don't actually work like that, so I think that this extension of the concept is warranted.)

I've seen this argument, and while I acknowledge it might be true for some people, I have no reason to believe that this isn't mistaken correlation* - if you pay more for something you probably care more about it. (Though the Ikea effect seems plausible, I could see that being a) the same kind of correlation, and b) if you make something then you probably make it the way you want it.)

*Or advertising. "Our teaching program[1] sets the price high so you will learn a lot!"

Have skin in the game

There are 'teaching programs'[1] that have people pay afterward (if they get good enough results).

I'm not quite sure how to have skin in the game with respect to a sequence of blog posts, but it seems important enough to try. Some possible ways:

Here are two different ways of reading that - your readers committing to do the exercises, and you committing to publish (in a specific way). Both offer insights, and information about what you're using might be informative. At a guess, you write one every day and publish it or you wrote the whole thing in advance or you have a buffer (that's smaller than the whole thing).

Between those two methods, one seems obvious for readers (right now) - don't build up a buffer, do each one as it is released. (This is easy to do with a regular release schedule.)

(I claim to teach techniques that work for me. At CFAR, they teach rationality techniques that work for nobody.)

That is an...interesting approach.

The ability to stop doing bad things means that trying things has almost no cost and a very high benefit (with a few notable exceptions).

Dangerous activities or addiction?

Unless you're going to cease having agency soon, you should probably spend much more of your time exploring than you currently do.

Building form sounds similar to building habits.

The knowledge that the researcher didn't have before seeing the rose and did have after seeing the rose is what I am referring to as the "phenomenology" of redness with respect to that researcher.

I thought it was experience/qualia, but I'm not too invested in those words.

If I ever give an example and you’re like “well I don’t think that would be good for me”, then remember that rationality is hair style agnostic.

This could have been more strongly/explicitly stated and also seems related to Should you reverse any advice you hear?[1]

It is important to remember that rationality techniques are not supposed to be weird.

They're supposed to be real. They're supposed to work. (If you put stock in a theory and it comes up with a weird answer, see if there's a cheap experiment, or look at why that answer seems weird - general theories require may require more evidence, but a more complete understanding of why one thing works here but doesn't over there, is valuable if correct (which should be carefully established - conflict between theories highlights an area to look closer at).)

If you use a rationality technique properly, the thing that comes out of it should make sense.

To reuse an earlier example:

You try doing your hair for 30 minutes and find you enjoy this, but it doesn't makes sense.

In meditating, the point is maybe something we call "enlightenment" that can only be understood by people who have seen it.

Activities where we're not sure what the point is are an interesting class.

Being able to sit still for a very long time gets you closer to the point of meditating (might be wrong about this one).

Can meditation be done while moving?


While the prior post was similarly abstract in subject matter, this post focused on presenting several things and was less detailed in steps.


[1] This is later referenced in the post under a different name.

Comment by pattern on In a rational world is there a place for ideology? · 2020-02-18T20:35:15.917Z · score: 2 (1 votes) · LW · GW


My limited perspective tells me in theory ideology would give you ideas to try and would bias potential solutions.

Theories also give you ideas to try. Is biasing potential solutions a good thing or a bad thing?


What is ideology?

I'll try to offer an answer here. (For the purposes of this comment "Ideology" is used negatively.)

Here's a frame:

A) Ideology: The way the world is + Things to do. Immutable 'Theory' that in the simplest case flows from one's self, multiple sources leads to complications and can involve integration and schisms, "not necessarily possessing any connection to reality" - it can say the 'sky is red' and treat that as a fact.


B) A model that generates claims is created (via some method). These claims are tested*, and models that produce false claims are rejected. If the process of claim generation integrates and reworks refuted theories, maybe "progress" can be made - or this just leads to an ensemble of probably overfitting theories that crash whenever a new (or old[1]) experiment is performed.

*Relevant quote for one way this can work: "He who decides the null hypothesis is king."

C) Theories are generated from data via some method. "Refutation" leads to revision, and maybe theories get more points if they make correct predictions in advance about experiments that have never been performed.

Under A, there are no constraints on theories - a theory can say anything at all.

Under B, a theory can say anything - except things that are "wrong". Any statement a theory makes that is later shown to be "wrong" means it is discarded/revised. The current pool of viable theories obeys the constraint "we don't know it's wrong". (Footnote 1 notes that this is incorrect - what is believed to have been shown via an experiment can change over time, especially as a result of evidence it was fake, not replicating, etc.)

Under C, theories don't just come with a bundle of "yes we've checked this and it was right/wrong, we haven't checked this yet, etc." These theories begin with evidence...but how can such a thing be shown via experiment? How do we know type C theories aren't just type B theories that later accumulated evidence? Does it matter?

The striking difference (as formulated here) between type A theories ("Ideologies") and everything else is that they don't have a connection to reality.

They can be seen as lazy theories - no requirements for predictions about reality, or that those predictions match reality. To be fair, if you were "absolutely certain" in a mathematical sense, then it would make sense to never change your mind. (Some argue that this is a basis for never being "absolutely certain" - but then how certain should one be that 1+1=2? An argument can also be made for methods that enable handling discontinuity, coming up with new theories, etc.)

But there's also the normative component - values. Are these immutable, or do they change? Are they based on 'truth' or something else?

If one values human lives, then one may consequently value things one believes are necessary or improve human lives. Let's say this includes clean water and cookies. But one later finds out water is necessary for human life, and cookies are bad for human health. In this toy model, the value of cookies has changed, but not the value of human lives. So human lives are judged good as an immutable part, and cookies/water judged based on consequence on the immutable value.

Part of this is based on what "is" - do people need water? cookies? Are these things good for them?

Part of this is purely "ought" - human lives are good. (Or the more complicated "good human lives are good".)

So what is "ideology" good for? It's good to know the truth, and it's good to know your values. Replacing ideology with theory where what is is concerned may be useful for finding the truth. Whatever framework should be used for ought/handling values, acknowledging the possibility of change/being incorrect (whatever that means) seems to suggest the possibility of change, of learning. And a mind that never changes, if wrong, 'can never become right'. But what does it mean to be wrong about what ought to be?


[1] This argues for 'preservation', and rewinding - a "theory" refuted by one experiment, which doesn't replicate, whose result is then reversed by several clear experiments, 'should' 'come back'.

Or it supports a more complicated model incorporating "probabilities". For a simplified model:

After inspecting a coin, and finding it bears two faces:

Theory H says: This coin will come up heads when flipped.

Theory T says: This coin will come up tails when flipped.

These both seem reasonable, but equally likely, so we'll pretend we've seen both happen once. After the coin is flipped n times, if the number of heads and tails sum to n, then the weight for the theories/outcomes is: h+1:t+1, where h is the number of heads actually seen, and t is the number of tails seen.

(What should be done if the coin settles on the edge is less clear - a new theory may be required (Theory E). And if the point of the imaginary outcomes is just so some weight will be given to outcomes we consider 'possible' but haven't observed, then after they've been observed, should the imaginary outcomes be removed?)

This offers one way of doing things:

An experiment is a trial which each theory says will provide 1 count of evidence for them. After being performed, whichever theory was 'right' gets 1 more point. The weights that develop over time serve as an estimate of the outcome of future experiments - and the probability that a coin comes up heads or tails.

This model doesn't include more complicated hypotheses like:

the coin will come up HTHTHTHT...repeating forever.

That count where so and so said the coin landed on an edge? That whole experiment was made up and never performed. (Or performed until that result was reached, and the prior experiments weren't recorded.)

Which leaves the question of how to handle them. If a result can obtained via many 'experiments' how do we incorporate that evidence if we don't have the number of experiments?

Comment by pattern on Set Ups and Summaries · 2020-02-18T18:04:54.876Z · score: 2 (1 votes) · LW · GW
I’m currently pondering how much you can get out of this, and specifically if it’s fair to reject a work because it failed pre-reading.

This might vary by genre.

should be help to a different standard

held, or held to a different standard of help

What standards things should be held to is a normative. It certainly makes sense to evaluate books differently based on this - writing with clarity makes it easier for the reader to get more out of it faster. Rather than being condensed into a single number score with a lot of other factors (4/5 stars, etc.) this can be a useful piece of information for recommendations (and reviews) to mention.

"I had to read this book twice because statements should have been in a different order at the beginning and end of chapters to indicate what the topic was/why it was important/why I should care"

is very different from:

"This book is filled with information, presented clearly and well. After you read it the first few times you learn from it each time, so you should re-read the book a few times to catch the things that use the knowledge you've obtained and build on it more, so you can learn everything this book has to teach."

Maybe my scattered opening and closing paragraphs should cause you to downgrade your assessment of these post (although if you could keep in mind what I’m capable of when I’m prioritizing idea transmission, that would be cool).

this post

I can separate comments relating to textual minutiae (this instead of these, held instead of help) from comments on content.

I’ll look like a real ass here if I don’t have a summary, but I’m still not sure what I’ve learned. I still think How to Read a Book is wrong to insist every book have a clearly defined Unity.

If a book isn't easy to summarize that's useful information, in addition to whether or not the book was useful.

Having a clearly defined Unity seems like one way a book can be good/valuable. (Good opening and closing paragraphs might naturally arise from or be easier to do if there's a Unity.) Perhaps it's useful to ask "What other ways can a book be valuable, which are independent of Unity, or run counter to it?" so one can come up with ways of making reading those kinds of books easier/more valuable (or just coming up with better ways of pre or post reading).

I’ve spent longer writing this and skimming the chapter than it would have taken to read it deeply, but that’s okay because it was a better use of my time.

Or how to get better schemes for skimming effectively, so there's more learning per unit of time.

Comment by pattern on Attainable Utility Preservation: Concepts · 2020-02-18T17:39:19.875Z · score: 2 (1 votes) · LW · GW
CCC says (for non-evil goals) "if the optimal policy is catastrophic, then it's because of power-seeking". So its contrapositive is indeed as stated.

That makes sense. One of the things I like about this approach is that it isn't immediately clear what else could be a problem, and that might just be implementation details or parameters: corrigibility from limited power only works if we make sure that power is low enough we can turn it off, if the agent will acquire power if that's the only way to achieve its goal rather than stopping at/before some limit then it might still acquire power and be catastrophic*, etc.

*Unless power seeking behavior is the cause of catastrophe, rather than having power.

Sorry for the ambiguity.

It wasn't ambiguous, I meant to gesture at stuff like 'astronomical waste' (and waste on smaller scales) - areas where we do want resources to be used. This was addressed at the end of your post already,:

So we can hope to build a non-catastrophic AUP agent and get useful work out of it. We just can’t directly ask it to solve all of our problems: it doesn’t make much sense to speak of a “low-impact singleton”.

-but I wanted to highlight the area where we might want powerful aligned agents, rather than AUP agents that don't seek power.

What do you mean by "AUP map"? The AU landscape?

That is what I meant originally, though upon reflection a small distinction could be made:

Territory: AU landscape*

Map: AUP map (an AUP agent's model of the landscape)

*Whether or not this is thought of as 'Territory' or a 'map', conceptually AUP agents will navigate (and/or create) a map of the AU landscape. (If AU landscape is a map, then AUP agents may navigate a map of a map. There also might be better ways this distinction could be made, like AU landscape is a style/type of map, just like there are maps of elevation and topology.)

The idea is it only penalizes expected power gain.

Gurkenglas previously commented that they didn't think that AUP solved 'agents learns how to convince people/agents to do things'. While it's not immediately clear how an agent could happen to find out how to convince humans of anything (the super-intelligent persuader), if an agent obtained that power, it continuing to operate could constitute a risk. (Though further up this comment I brought up the possibility that "power seeking behavior is the cause of catastrophe, rather than having power." This doesn't seem likely in its entirety, but seems possible in part - that is, powerful and power seeking might not be as dangerous as powerful and power seeking.)

Comment by pattern on Attainable Utility Preservation: Concepts · 2020-02-18T00:33:35.150Z · score: 4 (2 votes) · LW · GW

It did have that "aha" effect for me. (The drawings and the calligraphy were also amazing.)

Comment by pattern on Attainable Utility Preservation: Concepts · 2020-02-18T00:31:32.144Z · score: 4 (2 votes) · LW · GW

I liked this post, and look forward to the next one.

More specific, and critical commentary (It seems it is easier to notice surprise than agreement):

(With embedded footnotes)


If the CCC is right, then if power gain is disincentivised, the agent isn't incentivised to overfit and disrupt our AU landscape.

(The CCC didn't make reference to overfitting.)


If A is true then B will be true.


If A is false B will be false.

The conclusion doesn't follow from the premise.


Without even knowing who we are or what we want, the agent's actions preserve our attainable utilities.

Note that preserving our attainable utilities isn't a good thing, it's just not a bad thing.

Issues: Attainable utilities indefinitely 'preserved' are wasted.

Possible issues: If an AI just happened to discovered a cure for cancer, we'd probably want to know the cure. But if an AI didn't know what we wanted, and just focused on preserving utility*, then (perhaps as a side effect of considering both that we might want to know the cure, and might not want to know the cure) it might not tell us because that preserves utility. (The AI might operate on a framework that distinguishes between action and inaction, in a way that means it doesn't do thing that might be bad, at the cost of not doing things that might be good.)

*If we are going to calculate something and a reliable source (which has already done the calculation) tells us the result, we can save on energy (and preserve resources that can be converted into utility) by not doing the calculation. In theory this could include not only arithmetic, but simulations of different drugs or cancer treatments to come up with better options.


We can tell it:

Is this a metaphor for making an 'agent' with that goal, or actually creating an agent that we can give different commands to and switch out/modify/add to its goals? (Why ask it to 'make paperclips' if that's dangerous, when we can ask it to 'make 100 paperclips'?)


Narrowly improve paperclip production efficiency <- This is the kind of policy AUP_conceptual is designed to encourage and allow. We don't know if this is the optimal policy, but by CCC, the optimal policy won't be catastrophic.

Addressed in 1.


Imagine I take over a bunch of forever inaccessible stars and jumble them up. This is a huge change in state, but it doesn't matter to us.

It does a little bit.

It means we can't observe them for astronomical purposes. But this isn't the same as losing a telescope looking at them - it's (probably) permanent, and maybe we learn something different from it. We learn that stars can be jumbled up. This may have physics/stellar engineering consequences, etc.


AUP_conceptual solves this "locality" problem by regularizing the agent's impact on the nearby AU landscape.

Nearby from its perspective? (From a practical standpoint, if you're close to an airport you're close to a lot of places on earth, that you aren't from a 'space' perspective.)


For past-impact measures, it's not clear that their conceptual thrusts are well-aimed, even if we could formalize everything correctly. Past approaches focus either on minimizing physical change to some aspect of the world or on maintaining ability to reach many world states.

If there's a limited amount of energy, then using energy limits ability to reach many world states - perhaps in a different sense than above. If there's a machine that can turn all pebbles into something else (obsidian, precious stones, etc.) but it takes a lot of energy, then using up energy limits the number of times it can be used. (This might seem quantifiable, moving the world* from containing 101 units of energy -> 99 units an effect on how many times the machine can be used if it requires 100, or 10 units to use. But this isn't robust against random factors decreasing energy (or decreasing it), or future improvements in energy efficiency of the machine - if the cost is brought down to 1 unit of energy, then using up 2 units prevents it from being used twice.

*Properly formalizing this should take a lot of other things into account, like 'distant' and notions of inaccessible regions of space, etc.

Also the agent might be concerned with flows rather than actions.* We have an intuitive notion that 'building factories increases power', but what about redirecting a river/stream/etc. with dams or digging new paths for water to flow? What does the agent do if it unexpectedly gains power by some means, or realizes its paperclip machines can be used to move strawberries/make a copy itself which is weaker but less constrained? Can the agent make a machine that makes paperclips/make making paperclips easier?

*As a consequence of this being a more effective approach - it makes certain improvements obvious. If you have a really long commute to work, you might wish you lived closer to your work. (You might also be aware that houses closer to your work are more expensive, but humans are good at picking up on this kind of low hanging fruit. A capable agent that thinks about process seeing 'opportunities to gain power' is of some general concern. In this case because an agent that tries to minimize reducing/affecting** other agents attainable utility, without knowing/needing to know about other agents is somewhat counterintuitive.

**It's not clear if increasing shows up on the AUP map, or how that's handled.


Therefore, I consider AUP to conceptually be a solution to impact measurement.
Wait! Let's not get ahead of ourselves! I don't think we've fully bridged the concept/execution gap.
However for AUP, it seems possible - more on that later.

I appreciate this distinction being made. A post that explains the intuitions behind an approach is very useful, and my questions about the approach may largely relate to implementation details.


AUP aims to prevent catastrophes by stopping bad agents from gaining power to do bad things, but it symmetrically impedes otherwise-good agents.

A number of my comments above were anticipated then.

Comment by pattern on Attainable Utility Preservation: Concepts · 2020-02-18T00:31:07.417Z · score: 2 (1 votes) · LW · GW

For reference and ease of quoting, this comment is a text only version of the post above. (It starts at "Text:" below.) I am not the OP.


It's not clear how to duplicate the color effect* or cross words out**, so that hasn't been done. Instead crossed out words are followed by "? (No.)", and here's a list of some words by color to refresh the color/concept relation:

Blue words:

Power/impact/penalty/importance/respect/conservative/catastrophic/distance measure/impact measurement

Purple words:

incentives/actions/(reward)/expected utility/complicated human value/tasks


Last time on reframing impact:


Catastrophic Convergence Conjecture:

Unaligned goals tend to have catastrophe-inducing optimal policies because of power-seeking incentives

If the CCC is right, then if power gain is disincentivised, the agent isn't incentivised to overfit and disrupt our AU landscape.

Without even knowing who we are or what we want, the agent's actions preserve our attainable utilities.

We can tell it:

Make paperclips


Put that strawberry on the plate


Paint the car pink


but don't gain power.

This approach is called Attainable Utility preservation

We're focusing on concepts in this post. For now, imagine an agent receiving a reward for a primary task minus a scaled penalty for how much it's actions change its power (in the intuitive sense). This is AUP_conceptual, not any formalization you may be familiar with.

What might a paperclip-manufacturing AUP_conceptual agent do?

Build lots of factories? (No.)

Copy itself? (No.)

Nothing? (No.)

Narrowly improve paperclip production efficiency <- This is the kind of policy AUP_conceptual is designed to encourage and allow. We don't know if this is the optimal policy, but by CCC, the optimal policy won't be catastrophic.

AUP_conceptual dissolves thorny issues in impact measurement.

Is the agent's ontology reasonable?

Who cares.

Instead of regulating its complex physical effects on the outside world,

the agent is looking inwards at itself and its own abilities.

How do we ensure the impact penalty isn't dominated by distant state changes?

Imagine I take over a bunch of forever inaccessible stars and jumble them up. This is a huge change in state, but it doesn't matter to us.

AUP_conceptual solves this "locality" problem by regularizing the agent's impact on the nearby AU landscape.

What about butterfly effects?

How can the agent possibly determine which effects its responsible for?

Forget about it.

AUP_conceptual agents are respectful and conservative with respect to the local AUP landscape without needing to assume anything about its structure or the agents in it.

How can an idea go wrong?

There can be a gap between what we want and the concept, and then a gap between the concept and the execution.

For past-impact measures, it's not clear that their conceptual thrusts are well-aimed, even if we could formalize everything correctly. Past approaches focus either on minimizing physical change to some aspect of the world or on maintaining ability to reach many world states.

The hope is that in order for the agent to cause a large impact on us it has to snap a tripwire.

The problem is... well it's not clear how we could possibly know whether the agent can still find a catastrophic policy; in a sense the agent is still trying to sneak by the restrictions and gain power over us. An agent maximizing expected utility while actually minimally changing still probably leads to catastrophe.

That doesn't seem to be the case for AUP_conceptual.

Assuming CCC, an agent which doesn't gain much power, doesn't cause catastrophes. This has no dependency on complicated human value, and most realistic tasks should have reasonable, high-reward policies not gaining undue power.

So AUP_conceptual meets our desiderata:

The distance measure should:

1) Be easy to specify

2) Put catastrophes far away.

3) Put reasonable plans nearby

Therefore, I consider AUP to conceptually be a solution to impact measurement.

Wait! Let's not get ahead of ourselves! I don't think we've fully bridged the concept/execution gap.

However for AUP, it seems possible - more on that later.

Comment by pattern on Taking the Outgroup Seriously · 2020-02-16T18:20:19.090Z · score: 2 (1 votes) · LW · GW

(Contains an unendorsed model, as an example of a fake model.)

What do these sorts of claims all have in common? They don't take the outgroup seriously. Sure, there might well be some fringe radicals who actually

I disagree slightly with some of the examples. Here is what seems to generalize:

1. Some "ideas"/organizations exist that spread themselves. Intentional or not, if lying offers an advantage, then over time selection of groups (as they arise and die out) may lead to widespread lies/systems of lies.

2. How does one determine whether or not one is dealing with fringe radicals? The label "outgroup" suggests we consider the group we are dealing with to be fringe radicals.

3. What if the outgroup doesn't "take themselves seriously"? Consider the following example*:

Model: Sex leads to closeness/intimacy. This effect becomes weaker if after being activated, the people in question break up/etc..

There are groups that spread this to argue against sex before marriage.

But an alternative conclusion is that lots of sex is a good thing, as it enables people to become less overwhelmed by strong emotions which cause them to make rash decisions, which leads to marriages that don't last.

If this were a widespread response to the model, then maybe those groups would stop spreading it because they are using it to argue for something that they value/against something they anti-value.

While the above is a hypothetical, it points at a phenomenon that seems to be widespread - in which groups (and individuals) are not arguing in good faith, and taking them seriously will lead one astray.

*If you remember what post this example is from, let me know so I can add a link to it.


If you go around thinking that those who oppose you are all idiots, or crazy people, or innately evil, or just haven't thought about the situation (unlike you, of course!)... well, I won't say that you'll always be wrong, but that sure doesn't seem like the best way to go about trying to form an accurate model of the world!

If it seems wrong because it involves postulating that there are two types of people, you and everyone else in the world, then that seems easily fixed, by accepting that the conditions observed occur in oneself. (Although this should really be a matter of empirical judgement rather than theory - why should the best way of going about forming an accurate model of the world seem like the best way, when so many people are wrong?)

  • Everyone is foolish.
  • Everyone is evil.
  • Everyone is "crazy".

Each of these could be a starting point for a more complicated model.

Are people crazy in predictable ways?

Is wisdom randomly distributed throughout the population such that people tend to be wise in one domain but foolish at others, or is wisdom/foolishness a general trait?

Does everyone go about achieving their aims in largely similar ways, such that whether someone is good or evil will depend entirely on circumstance and what people believe they have to gain, or is it largely a subconscious/unreflective phenomena, or are people good and evil generally, or do people tend to be good in some areas but bad in others? And do those areas vary between people and change over time or with circumstance?

Comment by pattern on Deconfusing Logical Counterfactuals · 2020-02-16T03:20:58.281Z · score: 2 (1 votes) · LW · GW
Some people say this fails to account for the agent in the simulator, but it's entirely possible that Omega may be able to figure out what action you will take based on high level reasoning, as opposed having to run a complete simulation of you.

Unless you are the simulation?

In so far as the paraconsistent approach may be more convenient for an implementation perspective than the first, we can justify it by tying it to raw counterfactuals.

Like one might justify deontology in terms of consequentialism?

However, when "you" and the environment are defined down to the atom, you can only implement one decision.

Does QM enable 'true randomness' (generators)?

They fail to realize that they can't actually "change" their decision as there is a single decision that they will inevitably implement.

Or they fail to realize others can change their minds.

Comment by pattern on Deconfusing Logical Counterfactuals · 2020-02-16T03:19:48.467Z · score: 2 (1 votes) · LW · GW


Comment by pattern on Reference Post: Trivial Decision Problem · 2020-02-16T00:25:55.060Z · score: 2 (1 votes) · LW · GW

Some problems/posts are also about

a) implications which may or may not be trivial

b) what do you value? (If you can only take one box, and there are different harder to compare things than money and money in the boxes, which would you choose?)

Comment by pattern on It "wanted" ... · 2020-02-16T00:19:01.988Z · score: 2 (1 votes) · LW · GW

Can you give some examples?

Comment by pattern on Training Regime Day 1: What is applied rationality? · 2020-02-16T00:00:50.836Z · score: 3 (2 votes) · LW · GW

This comment consists solely of a different take* on the material of the OP, and contains no errors or corrections.

[*Difference not guaranteed, all footnotes are embedded, this comment is very long, 'future additions, warnings and alterations to attributes such as epistemic status may or may not occur', all...]


Take 1

Take 2

Take 3

(The response to (parts of) each take is in three parts: a, b, and c. [This is the best part, so stop after there if you're bored.])


Questions that may overlap with 'How to build an exo-brain?'

[I am not answering these questions. Don't get your hopes down, bury them in the Himalayas. (This is an idiom variant, literal burial of physical objects in the Himalayas may be illegal.)]

Take 1:


Sometimes “just checking” is infeasible to do at such a small scale.

Or what is feasible at small scale isn't particularly usable, though large scale coordination could enable cheap experiments.


When you find science insufficient for the task, applied rationality can help you make good decisions using information you already have.

I feel like this is re-defining what science is, to not include things that seem like they fall under it.


Compressed into a single sentence, applied rationality fills the gaps of science in the pursuit of truth.

I might have called science [a] pursuit of truth, though distinguishing between different implementations/manifestations of it may be useful, like a group pursuing knowledge, versus an individual. (Though if they're using similar/compatible formats, then possibly:

  • the individual can apply the current knowledge from the group, and the group's experiments
  • A bunch of individuals performing experiments and publishing, can be the same as a group, only missing aggregation
  • An individual can merge data/knowledge from a group with their own. (Similar to how, with the right licence, open source programs may be borrowed from, and improved upon by companies internally, but without improving the original source or returning these versions to the 'open' pool.)

Take 2:


Crucially, you have situationally bad advisors. When there is a tiger running at you at full speed, it is vital that you don’t consult your explicit reasoning advisor.

Crucially, you have 'slow' advisors, who can't be consulted quickly. (And presumably fast advisors as well.)

  • While you may remember part of a book, or a skill you've gained, things/skills you don't remember can't be used with speed, even if you know where to find them given time
  • While it may be quick to determine if car is going to hit you while crossing a street, it may take longer to determine whether or not such a collision would kill you - longer than it would take the car to collide, or not collide, with you.


I claim that similarly to the imaginary monarch making decisions, most of the work that goes into making good decisions is choosing
which sources of information to listen to. This problem is complicated by the fact that some sources of information are easier to query than others, but this problem is surmountable.

Most of the work that goes into making good decisions is choosing:

How long to make decisions, and

  • when to revisit them.*
  • which advisors to consult in that time

Managing the council. This can include:

  • Managing disagreements between council members
  • Changing the composition - firing councilors, hiring new ones (and seeing existing members grow, etc.)

*Including how long to take to make a decision. A problem which takes less time to resolve (to the desired degree) than expected is no issue, but a problem that takes longer may require revisiting how long should be spent on the problem (if it is important/revisiting how important it is).)


Compressed into a single sentence, applied rationality is the skill of being able to select the proper sources of information during decision-making.

As phrased this addresses 2.b (above), though I'd stress both the short term and the long term.

Take 3:

You look at all the possible options

There are a lot options. This is why 2.b focused on time. Unfortunately the phrase "Optimal stopping" already seems to be taken, and refer to a very different (apparent) framing on the hiring problem. Even if you have all information on all applicants, you have to decide who to hire, and hire them before someone else does! (Which is what motivates deciding immediately after getting an applicant, in the more common framing. A hybrid approach might be better - have a favorite food, look at a few options, or create a record so results aren't a waste.)

You decide that “rationality” is bunk and you should go with your intuition in the future.
This example might seem a bit contrived (and it is), but the general principle still holds.

So someone samples (tries) a thing once to determine if a method is good, but in applying the method doesn't sample at all. Perhaps extracting general methods from existing advisors/across old and new potential advisors is the way to go.

If you think that being [X] doesn’t work because [Y], then [try Z : X taking Y into account].
Compressed into a single sentence, applied rationality is a system of heuristics/techniques/tricks/tools that helps you increase your values, with no particular restriction on what the heuristics/techniques/tricks/tools are allowed to be.

That is very different from how I thought this was going to go. Try anything*, see what works, while keeping constraints in mind. This seems like good advice (though long term and short term might be important to 'balance'). The continuity assumption is interesting:

Don't consider points (the system as it is), but adapt it to your needs/etc**.

*The approach from the example/story seems to revolve around having a council and trying out adding one new councilor at a time.

**The amount of time till the restaurant closes may be less than the time till you'll be painfully hungry.


An exercise for the engaged reader is to find a friend and explain to them what applied rationality is to you.

I didn't see this coming. I do see writing as something to practice, and examining others' ideas "critically" is a start on

But I think what I've written above is a start for explaining what it means to me. Beyond that...

I might have a better explanation at the end of this "month", these 30 days or so.

This topic also relates to a number of things:

A) A blog/book that's being written about "meta-rationality"(/the practice/s of rationality/science (and studying it)):

B) Questions that may overlap with 'How to build an exo-brain?'

  • How to store information (paper is one answer. But what works best?)
  • How to process information*
  • How to organize information (ontology)
  • How to use information (like finding new applications)

*a) You learn that not all organisms are mortal. You learn that sharks are mortal.

How do you ensure that facts like these that are related to each other, are tracked with/linked to each other?

b) You "know" that everything is/sharks are mortal. Someone says "sharks are immortal".

How do you ensure that contradictions are noticed, rather than both held, and how do you resolve them?

(Example based on one from the replacing guilt series/sequence, that illustrated a more general, and useful, point.)

Thinking about above, except with information replaced with other words like "questions" and "skills".:


  • Storing questions may be similar to storing information.
  • But while information may keep, questions are clearly incomplete. (They're looking for answers.)
  • Overlaps with above.
  • Which questions are important, and how can one ensure that the answers survive?*


  • Practice (and growth)
  • It's not clear that this is a thing, or if it is, how it works. (See posts on Unlocking the Emotional Brain.)
  • Seems like a question about neuroscience, or 'how can you 'store' a skill you have now, so it's easier to re-learn/get back to where you are now (on some part of it, or the whole)?'*
  • This seems more applicable to skills you don't have, and deciding which new ones to acquire/focus on.

*This question is also important for after one's lifetime. [Both in relation to other people "after (your) death", and possible future de-cryo scenarios.]

Comment by pattern on A 'Practice of Rationality' Sequence? · 2020-02-15T19:12:32.399Z · score: 4 (2 votes) · LW · GW

Rather than gathering content here, we could recognize sequences on other sites.

Comment by pattern on Bayesian Evolving-to-Extinction · 2020-02-15T05:02:47.624Z · score: 2 (1 votes) · LW · GW
we can think of Bayes' Law as myopically optimizing per-hypothesis, uncaring of overall harm to predictive accuracy.

Or just bad implementations do this - predict-o-matic as described sounds like a bad idea, and like it doesn't contain hypotheses, so much as "players"*. (And the reason there'd be a "side channel" is to understand theories - the point of which is transparency, which, if accomplished, would likely prevent manipulation.)

We can imagine different parts of the network fighting for control, much like the Bayesian hypotheses.

This seems a strange thing to imagine - how can fighting occur, especially on a training set? (I can almost imagine neurons passing on bad input, but a) it seems like gradient descent would get rid of that, and b) it's not clear where the "tickets" are.)

*I don't have a link to the claim, but it's been said before that 'the math behind Bayes' theorem requires each hypothesis to talk about all of the universe, as opposed to human models that can be domain limited.'

Comment by pattern on A 'Practice of Rationality' Sequence? · 2020-02-15T00:33:10.752Z · score: 2 (1 votes) · LW · GW


And while prediction may be a skill, even if a project 'fails' it can still build skills/knowledge. On that note:

What could/should be a part of a 'practice' of rationality?

What skills/tools/etc. will (obviously) be useful in the future? and

What should be done about skills/tools/etc. that aren't obviously useful in the future now, but will be with hindsight?

Comment by pattern on A 'Practice of Rationality' Sequence? · 2020-02-15T00:32:53.111Z · score: 2 (1 votes) · LW · GW

The TL:DR comment on this is also the conclusion.

It was a group of rather committed and also individually competent rationalists, but they quickly came to the conclusion that while they could put in the effort to become much better at forecasting, the actual skills they'd learn would be highly specific to the task of winning points in prediction tasks, and they abandoned the project, concluding that it would not meaningfully improve their general capability to accomplish things!!

What you (can) learn from something might not be obvious in advance. While it's possible they were right, it's possible they were wrong.

And if you're right, then doing the thing is a waste, but if you are wrong then it's not.*

*Technically the benefit of something can equal the cost.

U(x) = Benefit - Cost. The first is probabilistic - in the mind, if not in the world. (The second may be as well, but to lesser extent.)

If this is instead modeled using a binary variable 'really good (RG)', the expected utility of x is roughly:

Outcome_RG*p_RG + Outcome_not*(1-p_RG) - cost

But this supposes that the action is done or not done, ignoring continuity. You to superforecaster you, is a continuum. If this is broken up into into intervals of hours then there may exist both a number of hours x and y, such that U(x)-cost >0, but U(y)-cost < 0. The continuous generalization is the derivative of 'U(x hours) - cost', and it becomes zero where the utility has stopped increasing and started increasing (or when the reverse holds). This leaves the question of how U(x) is calculated, or estimated. One might imagine that this group could have been right - perhaps the low hanging of fruit of forecasting/planning is Fermi estimates, and they already had that skill/tool.

Forecasting (predicting the future) is all well and good if you can't affect something, but if you can then perhaps planning (creating the desired future) is better. The first counterexample that comes to mind is that if you could predict the stock market in advance, then you might be able to make money off of that. This example seems unlikely, but it suggests a relationship between the two - some information about the future is useful for 'making plans'. However, while part of what information that will/could be important in the future may be obvious, that leaves:

  • how to forecast information about the future that's obviously useful (if the forecast is correct)
  • the information that's not obviously useful, but turns out to be important later (This is usually lumped under 'unknown unknowns', but while Moravec's paradox** can be cast as an unknown unknown, the fact that no one had built a machine/robot that did x yet, could be considered known.)

**Moving is harder than calculating.

Comment by pattern on The Catastrophic Convergence Conjecture · 2020-02-14T21:40:25.191Z · score: 2 (1 votes) · LW · GW
For example, a reward function for which inaction is the only optimal policy is "unaligned" and non-catastrophic.

Though if a system for preventing catastrophe (say, an asteroid impact prevention/mitigation system) had it's reward system replaced with the inaction reward system, or was shutdown at a critical time, that replacement/shutdown could be a catastrophic act.

Comment by pattern on The Reasonable Effectiveness of Mathematics or: AI vs sandwiches · 2020-02-14T20:49:35.023Z · score: 0 (2 votes) · LW · GW
On the other hand, you cannot offload some of your brain's neural networks to a computer (yet, growth mindset).

But you can run neural networks on a computer, and get them to do things for you. (I don't think this has taken off yet in the same way using the internet has.)

But, since a large component of the task is catering to human aesthetic tastes, math cannot compete with innate human abilities that are designed to be human-centric.

I'm skeptical of this. If we have found "math" to be so useful in the domains where it has been applied, why should it be supposed that it won't be useful in the domains where it hasn't been applied? Especially when its role is augmentation:

Now, using math doesn't replace our cognition, it augments it. Even when we use math we actually use all three types of thinking at once: the unconscious intuition, the conscious informal verbal reasoning and (also conscious) mathematical reasoning.

Determining what is safe to eat is not held to be a mystery.

Why should what is delicious to eat be any different? Why is this domain beyond the reach of science?

Comment by pattern on Distinguishing definitions of takeoff · 2020-02-14T17:45:10.829Z · score: 4 (2 votes) · LW · GW
The Event Horizon hypothesis could be seen as an extrapolation of Vernor Vinge's definition of the technological singularity. It is defined as a point in time after which current models of future progress break down, which is essentially the opposite definition of continuous takeoff.

This might be interesting to compare against how models of the stock market have changed over time. (Its particular relationship with statistics may be illuminating.)

Comment by pattern on ofer's Shortform · 2020-02-14T17:38:05.573Z · score: 2 (1 votes) · LW · GW

In theory, antitrust issues could be less of an issue with software, because a company could be ordered to make the source code for their products public. (Though this might set up bad incentives over the long run, I don't think this is how such things are usually handled - microsoft's history seems relevant.)

Comment by pattern on Confirmation Bias As Misfire Of Normal Bayesian Reasoning · 2020-02-13T21:03:32.203Z · score: 2 (1 votes) · LW · GW

While "isolated demands for rigor" may be suspect, an outlier could be the result of high measurement error* or model failure. (Though people may be systematically overconfident in their models.)

*Which has implications for the model - the data thought previously correct may contain smaller amounts of error.

Comment by pattern on Confirmation Bias As Misfire Of Normal Bayesian Reasoning · 2020-02-13T21:00:12.619Z · score: 2 (1 votes) · LW · GW
The protection trick here is "natural scepticism": just not update if you want to update your believes. But in this case the prior system becomes too rigid.

(not update if you want to protect your beliefs?, not update if you don't want to update your beliefs?)

Skepticism isn't just "not updating". And protection from what?

Comment by pattern on In theory: does building the subagent have an "impact"? · 2020-02-13T19:30:36.133Z · score: 2 (1 votes) · LW · GW
Once SA is built, A can just output ∅ for ever, keeping the penalty at 0, while SA maximises R0 with no restrictions.

So these impact measures are connected to individual actions, and an agent can achieve arbitrarily high impact via a long enough sequence of actions whose individual impact is less than R0, and it has an incentive to do so, because the sum of an infinite series of finite non-decreasing rewards diverges (which it evaluates individually, and thus has no problem with there being a divergent sum)?

Comment by pattern on Building and using the subagent · 2020-02-12T20:27:40.386Z · score: 4 (2 votes) · LW · GW
But that’s a side effect of the fact that A might like to move itself beyond the reach of ρ.

'X is a side effect of Y', is different from 'X and Y have a common cause'.

Comment by pattern on Demons in Imperfect Search · 2020-02-12T04:20:55.507Z · score: 2 (1 votes) · LW · GW


slowing the ball's descent to a crawl, conserving its potential energy in case a sharp drop [is] needed to avoid a competitor's wall.
Comment by pattern on Demons in Imperfect Search · 2020-02-12T04:03:37.668Z · score: 3 (2 votes) · LW · GW
Here's an example that comes to mind:


Comment by pattern on Attainable Utility Landscape: How The World Is Changed · 2020-02-11T18:20:24.076Z · score: -1 (1 votes) · LW · GW

There is a star, many light years away. If you exist in two locations at once simultaneously, from which the star is visible, and those two locations are not the same distance from the star, then intuitively, by seeing the star first from the closer position, you can know what it will look like from the second before it happens.

Less trivially, by altering the relative speeds of the two versions (with FTL telepathy), and setting up suitable devices for signaling, I think in theory, this would enable turning FTL into time travel. (Person A performs a calculation, and sends the results to Person B. Since Person A is the future version of person B, and they're the same person in two places at once simultaneously, then by 'de-synchronizing them right' a message can be sent into the past.)

Comment by pattern on Matt Goldenberg's Short Form Feed · 2020-02-11T18:04:22.524Z · score: 2 (1 votes) · LW · GW

Are you good at teaching people (your) existing conceptual models? (As opposed to how to make their own.)

Comment by pattern on Why do we refuse to take action claiming our impact would be too small? · 2020-02-10T22:45:35.018Z · score: 5 (3 votes) · LW · GW

How is impact correctly estimated (or its order of magnitude)? (And how can it be correctly estimated?)

Comment by pattern on Attainable Utility Landscape: How The World Is Changed · 2020-02-10T19:51:02.525Z · score: 4 (2 votes) · LW · GW
Going to the green state means you can't get to the purple state as quickly.
On a deep level, why is the world structured such that this happens? Could you imagine a world without opportunity cost of any kind?

In a complete graph, all nodes are directly connected.

Equivalently, we assumed the agent isn't infinitely farsighted (γ<1); if it were, it would be possible to be in "more than one place at the same time", in a sense (thanks to Rohin Shah for this interpretation).

The opposite of this, is that if it were possible for an agent to be in more than one place at the same time, they could be infinitely farsighted. (Possibly as a consequence of FTL.)

Comment by pattern on A Simple Introduction to Neural Networks · 2020-02-10T02:10:04.055Z · score: 4 (2 votes) · LW · GW


This was a very clear explanation. Simplifications were used, then discarded, at good points. Everything built up very well, and I feel I have a much clearer understanding - and more specific questions. (Like how is the number of nodes/layers chosen?)



What's the "ℓ"? (I'm unclear on how one iterates from L to 2.)

Nonetheless, I at least feel like I now have some nonzero insight into why neural networks are powerful, which is more than I had before reading the paper.

And you've explained the 'ML is just matrix multiplication no one understands' joke, which I appreciate.

As mentioned, we assume that we're in the setting of supervised learning, where we have access to a sequence S=((x1,y1),...,(xm,ym)) of training examples. Each xi is an input to the network for which yi is the corresponding correct output.

This topic deserves its own comment. (And me figuring out the formatting.)

For unimportant reasons, we square the difference

Absolute value, because bigger errors are quadratically worse, it was tried and it worked better, or tradition?

This makes it convenient to use in the backpropagation algorithm.

Almost as convenient as the identity function.

Comment by pattern on Pattern's Shortform Feed · 2020-02-08T00:47:20.173Z · score: 4 (2 votes) · LW · GW

There is a general pattern that occurs wherein something is expressed as a dichotomy/binary. Switching to a continuum afterwards is an extension, but this does not necessarily include all the possibilities.

Dichotomies: True/False. Beautiful/Ugly.


Logic handles this by looking for 'all true'.

If 'p' is true, and 'q' is false, 'p and q' is false.

More generally, a sentence could be broken up into parts that can be individually rated. After this, the ratio of true (atomic) statements to false (atomic) statements could be expressed - unless all the sub-statements are true, or all false. This can be fixed be expressing the 'score' as a (rational) number, with two choices of score:

true(sentence) = number of true statements / number of statements

false(sentence) = number of false statements / number of statements

And since every statement is true or false:

true(s) + false(s) = 1

And if we want to express how much truth is expressed, true(s)*num(s) = # of true statements. (These functions don't have the best relationship to each other, they're just meant to be intuitive enough.)

Consider the assumption: every statement is true or false. (Exclusively.)

Instead of diving into paradox, consider functions. Equality(s) returns the sentence "s is true". Negation(s) returns "s is false". These functions don't have a truth value, it's dependent on the variable that's passed in. Concat(s_1, s_2) returns "s_1 s_2", which can be just gibberish. But why is "Equality" named Equality - it preserves truth value but there are other functions with that property? It might be better thought of as a family of functions.

Now consider the function f that, given s, returns "s is true and s is false". And here is a function that is always false. Right?

(This next paragraph* is a framing without examples, and may be rejected or accepted. I'm treating 'paradoxes' in this way because, as the paragraph after it notes, truth seems to come from a system.)

But just as (self referential) sentences can be constructed that are 'paradoxical' - neither 'false' nor 'true', sentences may also be constructed which 'are both'. This may be resolved by pointing out that the first are "nonsense", and resolving that "nonsense is false", and saying that it doesn't matter what value is assigned to the second as there is no consequence. (For such a sentence may be false, or it may be true, but not both at once.) But these resolutions are at odds. Are not both kinds "nonsense"? Or if they are different, they seem different from both 'statements which can only be true' and 'statements which can only be false'.

To get back to our 'functions' (which take sentences as input, and return a sentence as output), consider the sentence "1+1=2". Is this true? In many systems yes, "base 3, base 4, base 5, ...", but not in "base 2", where "2" is not defined, "1+1=10". These systems may be converted between, and we may even say that while something is expressed one way in one system, and another way in another system, they're the same "fact" (or falsehood).

But having different systems enables much confusion, two (or more) people might disagree on what color the sky is currently, even if they both have eyes that work fine, and without any unusual atmospheric phenomena that change what the sky looks like if you take a few steps to the right, or the left, if only they disagree on what colors the words for colors mean. If you call X "red", and I call X "blue" we may still both see X.

To get back to truth(s) which can return "2/3" (meaning s contains 3 statements, 2 of which are true, one of which is false), why return one number? Why not two: 2,1: 2 true statements, 1 false. But there could be more statements than those two kinds. And here the path splits in two.

1. A particular methods of assigning one value to a sentence may 'fail on the paradox', or choose to call it false.* One method, one answer - every statement is true or false, exclusively.

2. A set for each possibility: It is true, it is false, it can be true or false, it cannot be either, etc. There's still a binary aspect to this: "is it true" receives the answer "yes" or the answer "no" exclusively. But, independently, "is it false" may also receive either answer.

Following the 2nd path, what does it mean for something to be true and false? Neither?

One way is this: "The sky is blue, and the clouds are red." Part of it is true, and part of it false. That which holds neither truth not falsehood, is nonsense.

How does this generalize? For that another dichotomy will be required.

Beautiful/Ugly***. While this may be subjective, the quaternary** view can be seen as claiming the binary view is false, some things are both beautiful and ugly, and some things are neither. Perhaps here this view will be less controversial, after all, if a thing is judged to be beautiful by one person, and ugly by another, "subjectively", then "objectively" might not the object be both? Perhaps something ugly and beautiful could be created by cutting something beautiful in half, and something ugly in half, and combining them? This may be trickier than combining a true statement and a false statement, but perhaps if something is both beautiful and ugly, both aspects can be seen, where something that is true and false might be swiftly proclaimed 'all wrong' (or all right).

Perhaps this has all just been confusing, or perhaps it will be useful. The notion of 'logical counterfactuals/counter-logicals' has seemed strange to me - it is not that "it could be that 2+3 = 4" but that must be a different system. What such a thing could mean in conjunction with a world, say, if you put 2 things in a container, and then three, and what results is 4, seems unclear. (Even making them creatures doesn't make sense, for if one eats another, why won't that happen later?) If it holds for a class of objects, then that changes the relationship between numbers and objects - an apple and an orange are together are two things, but even if all things have the property that under certain circumstances they react to produce or eliminate another of the same type, then unless this holds between classes, no more might one speak of an apple and an orange being 2, because they don't react with each other.

*Paradoxes working this way may be avoided by system design.

**One may eliminate one of these categories, and say, that nothing is neither beautiful nor ugly. Then the category still 'exists' though it has no members - a broader view may include things that are not, but absent a process for creating new categories, the more expansive view may be better before examining reality. And if someday that person finds something which is neither, then the bucket will be ready for this new object unlike anything seen before.

***This is one area where things may not be fixed, in a way that we don't see in math or logic. A view in which things don't have properties may be more useful - but it is harder to see this for things/properties like "numbers" which 'seem to exist'. "The tree falls in the forest" argument may also be had about beauty.

Comment by pattern on "But that's your job": why organisations can work · 2020-02-07T19:07:36.198Z · score: 4 (2 votes) · LW · GW

How do you find this concept relevant to the article?

Background for people not familiar with the term:

Kakonomics describes cases where people not only have standard preferences to receive a High-quality good and deliver a Low-quality one (the standard sucker's payoff) but they actually prefer to deliver a Low-quality good and receive a Low-quality one, that is, they connive on a Low-Low exchange.

Comment by pattern on "But that's your job": why organisations can work · 2020-02-07T01:33:02.869Z · score: 2 (1 votes) · LW · GW


the is more
Comment by pattern on "But that's your job": why organisations can work · 2020-02-07T01:30:37.048Z · score: 3 (2 votes) · LW · GW

Why won't the best systems win?

Comment by pattern on ryan_b's Shortform · 2020-02-06T22:50:48.432Z · score: 4 (2 votes) · LW · GW

The post you linked to (algorithmic efficiency is about problem information) - the knowledge that method X works best when conditions Y are met, which is used in a polyalgorithmic approach? That knowledge might come from proofs.

Comment by pattern on ryan_b's Shortform · 2020-02-06T22:48:53.476Z · score: 3 (2 votes) · LW · GW

A proof may show that an algorithm works. If the proof is correct*, this may demonstrate that the algorithm is robust. (Though you really want a proof about an implementation of the algorithm, which is a program.)

*A proof that a service will never go down which relies on assumptions with the implication "there are no extreme solar storms" may not be a sufficient safeguard against the possibility that the service will go down if there is an extreme solar storm. Less extremely, perhaps low latency might be proved to hold, as long as the internet doesn't go down.

How are algorithms made, and how can proofs improve/be incorporated into that process?

Given a problem, you can try and solve it (1). You can guess(2). You can try (one or more) different things and just see if they work(3).

1 and 2 can come apart, and that's where checking becomes essential. A proof that the method you're using goes anywhere (fast) can be useful there.

Let's take a task:

Sorting. It can be solved by:

  • 1. Taking a smaller instance, solving that (and paying attention to process). Then extract the process and see how well it generalizes
  • 2. Handle the problem itself
  • 3. Do something. See if it worked.

2 and 3 can come apart:

At its worst, 3 can look like Bogosort. Thought that process can be improved. Look at the first two elements. Are they sorted? No: shuffle them. Look at the next two elements...

4! = 12, twelve permutations of 4 elements. The sorting so far has eliminated some possibilities:

1, 2, 3, 4

1, 3, 2, 4

1, 4, 2, 3

2, 3, 1, 4

2, 4, 1, 3

Now all that's needed is a method of shuffling that doesn't make things less orderly... And eventually Mergesort may be invented.

In the extreme, 3 may be 'automated':

  • programs write programs, and test them to see if they do what's needed (or a tester gets a guesser thrown at it, to 'crack the password')
  • evolutionary algorithms
Comment by pattern on Mazes Sequence Roundup: Final Thoughts and Paths Forward · 2020-02-06T22:13:41.316Z · score: 4 (2 votes) · LW · GW

I appreciated this sequence - the posts, and as a whole.

One thing someone will need to at some point write (6a) Mazes That Are Not Within Organizations, discussing dynamics that produce similar results without people strictly being bosses and subordinates. And generally (6b) What Types of Things are How Maze-Like, (6c) To What Extent do People At Large Have the Maze Nature, (6d) Close Examination of Maze Interactions, and so on.

The notes on future works were useful.

One hint you might be in a maze is that you are “doing the thing” in quotation marks rather than doing the thing.

As was this.

I feel like this might get at the heart of the 'why is optimizing bad?' question around this - if mazes are less effective, then how do we get rid of them if not by optimization?*

*One answer is alignment, but optimization offers something to be aligned to.

'Too much optimization = not enough slack' reads like 'optimizing for the wrong things'.

Comment by pattern on Plausibly, almost every powerful algorithm would be manipulative · 2020-02-06T21:42:39.878Z · score: 2 (1 votes) · LW · GW
Manipulation emerges naturally

Empirical claims. (Creating a specific example (running code) does not demonstrate "natural", but can contribute towards building an understanding of what conditions give rise to the hypothesized behavior, if any.*)

Of course, the manipulation above happened because the programmers didn't understand what the algorithm's true loss function was. They thought it was "minimise overall loss on classification", but it was actually "keep each dataset loss just above 0.1".

This seems incorrect. The scenario highlighted that with that setup, the way "minimise overall loss on classification" was optimized led to the behavior: "keep each dataset loss just above 0.1". Semantics, perhaps, but the issue isn't "the algorithm was accidentally programmed to keep each dataset just above 0.1", rather that is a result of its learning in its setup.

*A tendency to forget things could be a blessing - a representation of the world might not be crafted, and a "manipulative" strategy not found. (One could argue that by this definition humans are "manipulative" if we change our environment - tool use is obviously a form of 'manipulation', if only 'manipulating using our hands/etc.'. Similarly if communication works, it can lead to change...)

There is no clear division, currently, between mild manipulation and disastrous manipulation.

The story didn't seem to include a disaster.

Comment by pattern on Chris_Leong's Shortform · 2020-02-06T19:17:07.428Z · score: 5 (3 votes) · LW · GW

You made a good point, so I inverted it. I think I agree with your statements in this thread completely. (So far, absent any future change.) My prior comment was not intended to indicate an error in your statements. (So far, in this thread.)

If there is a way I could make this more clear in the future, suggestions would be appreciated.

Elaborating on my prior comment via interpretation, so that it's meaning is clear, if more specified*:

[A] it's a contradiction to have a provable statement that is unprovable, [B] but it's not a contradiction for it to be provable that a statement is unprovable.
[A'] It's a contradiction to have an unprovable statement that is provable, [B'] but it's not a contradiction for it to be unprovable that a statement is provable.

A' is the same as A because:

it's a contradiction for a statement to be both provable and unprovable.

While B is true, B' seems false (unless I'm missing something). But in a different sense B' could be true. What does it mean for something to be provable? It means that 'it can be proved'. This gives two definitions:

  • a proof of X "exists"
  • it is possible to make a proof of X

Perhaps a proof may 'exist' such that it cannot exist (in this universe). That as a consequence of its length, and complexity, and bounds implied by the 'laws of physics'* on what can be represented, constructing this proof is impossible. In this sense, X may be true, but if no proof of X may exist in this universe, then:

Something may have the property that it is "provable", but impossible to prove (in this universe).**

*Other interpretations may exist, and as I am not aware of them, I think they'd be interesting.

**This is a conjecture.

Comment by pattern on Chris_Leong's Shortform · 2020-02-06T07:29:18.275Z · score: 3 (2 votes) · LW · GW
Here's one way of explaining this: it's a contradiction to have a provable statement that is unprovable, but it's not a contradiction for it to be provable that a statement is unprovable.

Inverted, by switching "provable" and "unprovable":

It's a contradiction to have an unprovable statement that is provable, but it's not a contradiction for it to be unprovable that a statement is provable.

Comment by pattern on Chris_Leong's Shortform · 2020-02-06T07:26:15.532Z · score: 2 (1 votes) · LW · GW

Your duties (towards others) may include what you are supposed to do if others don't fulfill their duties (towards you).

Comment by pattern on Eukryt Wrts Blg · 2020-02-06T00:55:54.841Z · score: 2 (1 votes) · LW · GW

Category Theory Without The Baggage seems relevant.

Comment by pattern on Eukryt Wrts Blg · 2020-02-06T00:54:04.050Z · score: 2 (1 votes) · LW · GW

Differentiation could also be used to enable a more organized effort to make material more reachable to a wider audience. (Like wikipedia versus simple wikipedia.)

Comment by pattern on Chris_Leong's Shortform · 2020-02-06T00:48:27.699Z · score: 5 (2 votes) · LW · GW
it seems that most of the harm of holding onto a grudge comes from the emotional level and the drives level, but less from the duties level.

The phrase "an eye for an eye" could be construed as duty - that the wrong another does you is a debt you have to repay. (Possibly inflated, or with interest. It's also been argued that it's about (motivating) recompense - you pay the price for taking another's eye, or you lose yours.)

Comment by pattern on [AN #85]: The normative questions we should be asking for AI alignment, and a surprisingly good chatbot · 2020-02-05T19:29:28.738Z · score: 2 (1 votes) · LW · GW


we only get data [from] the outer objective on the training distribution,
Comment by pattern on Meta-Preference Utilitarianism · 2020-02-05T18:13:27.109Z · score: 2 (1 votes) · LW · GW
If there is such a thing as ‘meta-preference ambivalence’ we could gauge that too: “People who do not have any meta-preferences in their utility-function get a score of 0, people for whom the entire purpose in life is the promotion of average utilitarianism will get a score of 1 etc.
Just multiply the ambivalence with the meta-preference and then add all the scores of the individual methods together (add all the scores of the preferences for “median utility” together, add all the scores for “total utility” together etc) and compare.

This seems unnecessary. Ambivalent means the weight given to the different options is a 1:1 ratio.

Let’s say we were able to gauge everyone’s (underlying) preferences about how much they like certain methods of maximizing by holding a so called utilitarian vote.

What should the voting method to start with be?

EDIT: This other comment from the OP suggests that ratios aren't taken into account, and ambivalence is accounted by asking that as a question;

Comment by pattern on Meta-Preference Utilitarianism · 2020-02-05T17:50:29.349Z · score: 2 (1 votes) · LW · GW
if the true morality left nothing underspecified, then morally-inclined people would have no freedom to choose what to live for. I no longer think it's possible or even desirable to find such an all-encompassing morality.

Consider the system "do what you want". While we might not accept this system completely (perhaps rejecting that it is okay to harm others if you don't care about their wellbeing), it is an all encompassing system, and it gives you complete freedom (including choosing what to live for).