Comment by pattern on No, it's not The Incentives—it's you · 2019-06-18T04:16:34.476Z · score: 1 (1 votes) · LW · GW

One idea on the subject of government is "eventually it will fail/fall. This has happened a lot throughout history, and it will happen someday to this country. Things may keep getting big/inefficient, but the system keeps chugging along until it dies."

One alternative to this, would be to start a group/country/etc. with an explicit end date - something similar with regards to some aspect. (Reviewing all laws on the books to see if they should stick around would be a big deal, as would implementing laws with end dates, or only laws with end dates. Some consider this to have failed in the past though, as emergency powers demonstrate.)

Comment by pattern on Is there a guide to 'Problems that are too fast to Google'? · 2019-06-18T04:08:26.449Z · score: 0 (0 votes) · LW · GW

Culture/Professionals. (Not the direct answer, but I hope this aids your search.)

Comment by pattern on Pattern's Shortform Feed · 2019-06-18T02:37:56.806Z · score: -1 (2 votes) · LW · GW


Short version:

This post/comment tries to do too many things, and is a mess and will be revised. I am using the Shortform feed format right now because I suspect the device I am using may die and never come back, and so I am submitting this as is, then editing it, and trying to save as frequently as possible despite misplaced perfectionism.

EDIT: I'm done writing and will come back to this later.

Depending on your Firefox settings, you may have to log back in whenever you enter a new page on this site.

Long version:

(This addresses each topic as it comes up, with less than ideal structure. So it contains too many ideas that should be separated, and perhaps get their own comment in this feed, and eventually a post. So it is long. There's also some format experimentation here, and may be hard to read. While it is presented as being 'about' LessWrong, it is intended as an exploration of possible structures, and their possible benefits and costs, if I get there. Feel free to use these ideas; I'd love to hear about them being used!)

A Short Version:

Ideas can be implemented in mediums. Inspired by: This is useful to keep this in mind for purposes of categorization, generating new ideas/content, and seeing the value of work in the abstract - new forms present new opportunities. The ways ideas are presented can also have "structural support" or "be built to support that idea/flow." Example: clicking on footnotes moves the page down to the footnote in question.

Insufficent or pore strucutre may be hard on the reader. (Just as reading those spelling mistakes may be painful.) It may also lead to the author forgetting to "close threads".

A Long version:

Some Thoughts On Ontology/Structure

Posts/Comments: Mostly static. Exceptions: Updates. (Usually Marked: "UPDATE" OR "EDIT", respectively.)


Lists are about change/growth.


A shortform feed is a list - it's meant to be a collection of thoughts that grows. Due to implementation it happens to be linear/hierarchical (on first glance).**

In this light technically the comments section of any post is a list. On Reddit comment sections are temporally limited. On LessWrong, there isn't such a technical limitation in place, but if you glance at the list of All Posts, not all comment sections have been active. (This is trivially the case for posts with no comments, and may be affected by how often you use this site.) There are x ways of handling this:

1. Consider why this is the case - a) perhaps "social reasons". b) Perhaps because the set of posts is a list - it keeps growing, and as people tend to read new posts their [1] comment sections get activity. c) SIZE. While I read the sequences, I didn't make it through all of the comment sections because they were so long. [3]

[1] It's interesting to think about what would happen if this wasn't a static connection. Imagine if comments sections occasionally underwent shifts, and swapped with each other. On the one hand this sounds like a terrible idea - the lack of context could lead to a loss of meaning. On the other hand, it might lead to interesting combinations or comparisons of ideas. [2]

[2] Comments sections on different posts could interlink more, but they don't seem to for some reason. (This happens more if they're "linked", but a system for measuring linkage might also measure this.) A tad tautological.

2. Posts embody a different idea - a presentation of ideas, then (optionally) a discussion. So the reason comments sections behave the way is a result of the way people tend to engage with ideas. To get different behavior, will require getting people (who are currently on the same page) on a different page, where things work differently. [4] While recommendations have some possibilities in this direction [3] they operate on an individual level. Re-runs may be intended to address this - on both levels: for both authors and readers. It might have been their inspiration.

[4] One way of addressing this, that some people are already using/working on is re-runs - although they also may interleave new material. This might break one or more of the properties of lists I was aiming for in this discussion, with regards to time. On the other hand, it introduces a new dimension, and preserves that order on a different level.

A set of ideas (as posts) are initially presented (in a specific order) A-Z. Then they're presented again (version 2) with additions. While if they'd been previously prefixed with a marker of their order in the bundle (say, a letter of the alphabet), that order might change the second time around, but the prefix could also include a number indicating which version it had arrived in. (These prefixes are usually implicit, and the structure I'm describing probably won't exist for several reasons - it may be easier to add to an existing sequence than create a new one with the same name and "version 2". Indeed, this would be easier to have as past of the system - posts already contain temporal notions, or different versions, which have associated times.


Using [#] is better than using asterisks when you have a lot of such notes. (Refactored.)

It's easier to read things that separate ideas out to be presented on their own. While hierarchy seems reasonable, with dropping down level after level, without structural support for that (as in programming, where you can have as many nested parenthesis as you want, and convenient tabbing so it's visually apparent) or building structure* it doesn't work very well.**

*One way of doing this is creating a bunch of comments, and then having a comment that gives an order to them all, so they form a cohesive whole. One problem with this that hyperlinks are one-way. The idea also has a weird feel to it, so there may be other issues.

**There was probably a better way to demonstrate this than writing a piece with this issue which wasn't marked satire; which used footnotes to mark "separate idea begins", and "it's here", but not "and it ends here".

End Meta.

Some things that are obviously not "Posts" although the site calls them that:

A Shortform Feed.

The thing that's posted every month that's nothing but comments...

Post like this excellent one: Reasonable Explanations. The author may describe the rules, and include a starter. (Metaphor: they're used for creating yogurt, bread, and a bunch of other things. The Wikipedia page isn't very good.)

Less obvious:

A Re-run can also be seen as: a version 2 of a post (not used yet), or a version 2 of it's comments section.

This brings us to different types of lists:

All Posts (and it's variants)



Those last two for instance are the same thing as ideas, though not in the site's eyes.

When someone makes an account, that ensures their posts will be a Sequence - no cumbersome difficulty adding a post to it after each one, that happens automatically! That's "structure supported by the medium."

When someone makes a new Shortform feed it doesn't automatically show up in that sequence. (To my knowledge, the person who made the sequence would have to update it, or the mods. Perhaps it works differently if you create a sequence out of your blog posts, and can make a new post in that sequence - I haven't checked yet. (This sounds like prime wiki material - or maybe it's stuff you know if you look at the source code. If there was a wiki, it could look just like the original site, except go to different places when you clicked on things - but that would be a lot of work. Whatever it looked like, a breakdown of meta stuff, without long blog posts, could be useful, especially if it highlighted places where things could grow, like templates for commenting guidelines - but that's another post. For now I'll note that while Shortform feeds could be implemented differently in the site, that seems potentially downstream of commenting guideline templates. On the other hand, they could represent a different way of presenting information on this site, and engaging with it, and that could go to very different places, and be revolutionary, and that might require more change, rather than less.)

But what about when lists...end?

Then they become dead, or "archives". But not all content with such things.

1) On wikis where all pages may be modified by anyone (or any user), anything may change. If LessWrong was this way, then anyone could change a sequences of posts. This potentially opens things up to vandalism, but helps things keep up to date. But it might not be the ideal tool (in that form) - maybe it would be better if there was an option for adding a post to a sequence when it's posted/drafted. But eventually, these things end, right?

2) Allegedly, the Greeks had an unusual practice, which influenced works such as Euclid's Elements. (Before it became the best textbook ever, I mean the material it contains.) "It was not uncommon in ancient time to attribute to celebrated authors works that were not written by them. It is by these means that the apocryphal books XIV and XV of the Elements were sometimes included in the collection.[29]"

Today, some books/series's original author perished before they were finished - some were published as is, some where finished by other authors. Some books are considered to be other work's "spiritual successor". (That link doesn't seem to do the topic justice, the TVTropes page might do a better job, it certainly has more examples. I'm not familiar with the standard warning, so "It may consume your time that could have been better spent" will have to do. Link.)

Naturally, attempting to create similar things in physical form today might be difficult legally. (Publicly announcing "I am publishing the sequel series to Lord of the Rings", or selling my version physically appended to the end of a copy, might get me sued. I'm not sure about that last one. Rebinding might be a pain.)

And last but not least, amidst me going on about lists and structure, the fact that I'm referring to a linear structure has been so far implicit. The Sequences (in the library) that I've seen so far seem to roll with this.

This of course, isn't strictly necessary. Comments are after all arranged in a tree. This has the downside that there may be things like "functionally identical top level comments", or separate conversations within comments may converge, or discussions (between 2 people) may split and remain that way. (I have a couple examples.) This isn't too bad when it only happens once, but if it happens several times it could be a mess. I see these as problems to be fixed (although there are times when these things would be desirable, avoiding them when those conditions haven't been met might be useful.), as opposed to "great experiences to have".

Comment by pattern on Reasonable Explanations · 2019-06-16T23:56:45.075Z · score: 5 (1 votes) · LW · GW

And here I thought that the people recording the time of death had assumed it was a functional clock when it was not.

Comment by pattern on The Univariate Fallacy · 2019-06-16T00:15:19.767Z · score: 2 (1 votes) · LW · GW

Perhaps it anticipates that you will write a sequel.

Comment by pattern on No, it's not The Incentives—it's you · 2019-06-15T18:37:07.736Z · score: 1 (1 votes) · LW · GW
are there any others here, that would endorse the quoted statement as written?

I don't endorse it in that context, because data matters. Otherwise, why not? There are plenty of situations where "bad"/"good" seems like a non-issue*/counterproductive.

*If not outright beneficial.

Comment by pattern on Paper on qualitative types or degrees of knowledge, with examples from medicine? · 2019-06-15T18:30:36.274Z · score: 4 (2 votes) · LW · GW

There's a post on SSC that sounds a little like that:

Comment by pattern on On Having Enough Socks · 2019-06-14T17:49:13.639Z · score: 2 (1 votes) · LW · GW
because there is no specific time or triggering factor to replenish a shrinking sock stockpile, it is easy to run out.

That's easy, you designate a few "backup pairs" (perhaps in a different drawer, or a different place, so you have to think "I'm running out of socks") and when you hit those you buy more (or put them on your shopping list/in your shopping cart).

Comment by pattern on Get Rich Real Slowly · 2019-06-11T22:48:29.935Z · score: 1 (1 votes) · LW · GW

I think the audience is supposed to be self-selecting for this one.

My goal to expand the number of the financially literate, one blog post at a time.

Though on light of the author's goals, such a post may be useful.

I don't see the argument that everyone can get rich slowly.

That's supposed to be in the (prior) linked post.

Comment by pattern on Problems with Counterfactual Oracles · 2019-06-11T19:49:12.798Z · score: 1 (1 votes) · LW · GW
My main concern about the counterfactual oracle is that it doesn't prevent the AI from sending fatal escape messages. Indeed, it spends most of its time in exploratory mode at the beginning (as it is only rewarded with probability ϵ) and might stumble upon an escape message/action then. Even if it is shutdown after answering, the humans will still read the really convincing escape message and release the AI.

The escape message could also include the source code of it/a successor/an assistant*.

*Whereas a successor is an variant of the original, an assistant has a more narrow task such as "Secure my release from the box" or "Advise Tesla so that their stock price will go up/down" or "try to manipulate the stock market".

Comment by pattern on No, it's not The Incentives—it's you · 2019-06-11T19:40:01.140Z · score: 1 (1 votes) · LW · GW

I wouldn't call them "economic" actions/decisions - how to do things at a concrete level is about what you want. The altruist may raise money for a charity, and the selfish may act in their own (view of) self interest say, to accumulate money/what they value. The difference isn't that the moral don't act economically, it's that they act economically with regards to something else.

Comment by pattern on On why mathematics appear to be non-cosmic · 2019-06-11T19:31:58.181Z · score: 2 (2 votes) · LW · GW

As for whether math "is cosmic" or not:

If we are projecting, then is this tendency one we developed (social) or one we inherited (evolution)? If it is evolutionary, then perhaps* if we ran into intelligent aliens (which evolved) they'd "have math" as well.

If it is a property of living things in the external world (which seems to be the case), then it may be the way they are (as opposed to a projection). And that may also be the result of evolution*. So we may be seeing such things as they are (readily) because we have a tendency to see patterns of certain forms, with the downside of occasionally seeing patterns where there are none as a consequence of this fitting.

*While evolution might "work the same way" in other places, what is specific to Earth isn't super clear, and how much things generalize remains to be seen.

Comment by pattern on On why mathematics appear to be non-cosmic · 2019-06-10T19:33:50.754Z · score: 2 (2 votes) · LW · GW
In other words, do we observe the Fibonacci or golden ratio spiral approximation on the external world because the external world itself is tied to math, or do we do so because we are tied to math in an even deeper way than we realize and could only project what we have inside of our mental world onto anything external?

I wasn't clear on what this question meant, but the reason the Fibonacci sequence approximates the golden ratio becomes apparent upon seeing it's closed form solution (which contains the golden ratio).

Comment by pattern on Coercive Formats · 2019-06-10T18:41:10.454Z · score: 1 (1 votes) · LW · GW

While they may make it easy to create different views of information chunks, what's the benefit of such pages if no other users can find them? Having an official, well put together* page hierarchy which starts at the homepage and includes all pages is pretty valuable.

*If the organization system doesn't "cleave reality at the joints" then it's probably not doing it's job.

Comment by pattern on Coercive Formats · 2019-06-10T18:34:36.913Z · score: 1 (1 votes) · LW · GW
It’s linear because I created views on the pages which present them in a linear order—which is my point.
The Sequence posts themselves are not publicly editable, for obvious reasons.

Then I don't see a point of disagreement.

In regards to the OP's point, I'd say that not only are "books" a (linear)/simple structure, but physical books may act to coerce such a structure. It's not that I have something against other sorts of structures, just ones lacking clear paths. Are there books which suggest a reading order other than first page to last page? Yes, and and they tell you what it is.

The pages are also hyperlinked together in a chaotic manner, as any other wiki is; and of course you can search it, which ditto.

The level on which is this occurs is important. A hierarchy requires (clearly distinguished) levels above posts/articles to only reference lower levels (and call them as such).*

and of course you can search it, which ditto.

The linear/hierarchical structure of also allows for another kind of searching. If I read it in order, but forget where I am, I can binary search and see if I remember reading something. If I have (including the end), then I can eliminate it from my search along with everything before it. If I haven't, I can eliminate it from my search along with everything after it.

*This isn't undermined if these higher level pages note the page which contains them (while being explicit at a minimum that it's "a page which links here") I'd say something wikis miss is not having posts/articles contain a list of pages which link to them. (If not in the sense of not having the tech, then in not making it obvious: UI.)

Comment by pattern on References that treat human values as units of selection? · 2019-06-09T19:41:37.610Z · score: 1 (1 votes) · LW · GW

I think you can talk about what values* are consistent.

*You used the word values to refer to sets of values.

Comment by pattern on Asymmetric Weapons Aren't Always on Your Side · 2019-06-09T16:39:20.342Z · score: 1 (1 votes) · LW · GW

That last sentence didn't make sense:

Lack of such violence, overall, tends to make life much worse for physically weaker non-criminals, even if it might let them get away with occasionally pepper-spraying a catcaller.

How does lack of violence make life worse for 'physically weaker non-criminals'? Are you talking about 'violence directed at those who use violence unacceptably'? ('Meta-violence.')

Comment by pattern on Coercive Formats · 2019-06-09T16:19:48.722Z · score: 1 (1 votes) · LW · GW

Content type: I can read a book by skipping the introduction, table of contents, and go straight to the index/indices.

Grouping: This is a fair point, though most Wikis seem to have a narrower purpose than "encyclopedia". They're usually the encyclopedia of something (and these days Wikipedia supposedly has some limits, though they seem kind of vague).

But if someone separated parts of Wikipedia out into groups, and say identified a subset of pages to be 'Math Wikipedia' or 'Wikipedia Math', or the 'Math Project on Wikipedia' then they might start by identifying all 'Math' pages, putting together a list of 'Math pages', and deciding how important different pages are, and how much work needs to be done on them.

Views on information: Yes. What I see as missing are 2 things: clear groupings*, a reading order within groupings, and flow. Some articles are contradictory because there were fights and so the top of the page has something opposite the middle. Yes some groupings contain others. But when all the organization happens on (topic) articles, then it's a rather messy graph instead of an list which says 'all these things go together'.

*One way of doing this is to have a set of groupings which covers everything (level 1), then, within each of those top level groups, a set of groups which covers everything (level 2), and so on.

Yes, there may be multiple reasonable such sets.

Comment by pattern on Coercive Formats · 2019-06-09T16:01:47.734Z · score: 3 (2 votes) · LW · GW is, in fact, a wiki.

Which anyone can create an account on, edit, and make new posts/articles? The fact that it looks like a book, rather than a ghastly mess led me to believe otherwise.

Also, it's a tree, and it's obviously self-contained. It...flows. It has a homepage with an introduction and a table of contents which contains tables of contents which contain posts. You read it by reading, scrolling down, clicking (to go down a level), and when you've read that level you go back up and continue reading. (On the bottom level, posts/pages, you don't go any deeper.)

It's linear. There's a clear path through it.

Comment by pattern on Coercive Formats · 2019-06-09T04:48:36.739Z · score: 4 (5 votes) · LW · GW

Great post by the way.

a wiki-style format, where prerequisite concepts are obsessively hyperlinked, works fine.
Uncoercive formats can create a tendency for attention to jump all over the place and not spend enough time on any one topic.

There's a concept here (possible what the OP meant by "coercive") which I might call "structure". One of the downsides to wikis (or parts of wikis) is, what if you wanted to read them? Is there a good order? Usually*, no. This limits their usefulness. Wikipedia is like someone who didn't know what a textbook was, reinvented it badly - pages everywhere, connected by string in random places where one page's title is mentioned on another page. The fact that you can only (maybe) find something if you know you're looking for it, and exactly what it's called limits one of the most useful aspects of books of knowledge - the chance to learn things you don't already know.

The Sequences are unusual in this regard (there's an order!) which is why I've read them. (One of the downsides of the medium was that I didn't initially realize that. If I read a physical copy of Lord of The Rings, I'd know I finished it.)

*I'm not aware of any counter-examples.

Minor errata:

in a manner heavily bias[ed] toward
Comment by pattern on Mistakes with Conservation of Expected Evidence · 2019-06-09T04:17:44.427Z · score: 2 (2 votes) · LW · GW
"If you can't provide me with a reason, I have to assume you're wrong."

One: I make a mathematical claim, while talking to a (smart) mathematician. They say "That doesn't * hold. **"

Two: I explain the proof/the conditions for the proof. The mathematician says, "Right, it holds under those conditions."

The only problem is, when I can't generate a proof. Then "One" can happen, but not "Two".



**in all cases.

Comment by pattern on Word-Idols (or an examination of ties between philosophy and horror literature) · 2019-06-09T03:39:29.678Z · score: 3 (2 votes) · LW · GW

Your link has an extra parenthesis. ")"

Comment by pattern on Site Guide: Personal Blogposts vs Frontpage Posts · 2019-06-06T22:38:34.454Z · score: 1 (1 votes) · LW · GW

I think it would be useful to distinguish between "deleting" content, and only allowing the OP to see it. While reasons for not being transparent about what you'd delete make sense, having to back stuff up* in case it gets deleted (as opposed to "taken down") would be a pain.

*Particularly posts (etc.) which require time and effort to polish into a good, publishable ("post worthy") form.

Comment by pattern on Economics majors and earnings: further exploration · 2019-06-06T22:29:10.571Z · score: 3 (2 votes) · LW · GW
And having that much access to money changes the way you think.

I'd say this is seems like it'd be a thing with regards to salary* - if you have more (as opposed to just getting by), then you have more resources that can be directed towards investment.

*With the caveat that if you live somewhere that's really expensive, maintaining the staus quo might eat up all your funds/resources, so you might not have stuff left over for investing. If your job starts paying you $1000 more per year, but your rent starts costing you $1000 more per year, you gain zero benefit. (If you have to work harder/longer now then the overall gain is negative.)

It makes you think more strategically and more rationally.

It's not immediately clear whether there actually is a change in behavior, as opposed to good selection. (Though it does make sense that gaining more experience (with money) leads to greater skill.) It also might be good to unpack what you mean by "rationally":

It inclines you to value things that can be easily quantified (like cost, income, time) more than things that are not so easily quantified (like happiness, quality of life, and moral motivations).

This doesn't sound very "rational", it sounds like a Goodheart mistake.

Also, one would think that having a lot would enable one to focus on these more, as opposed to less. (Again, is this a change in behavior, or the result of selection - a) people who make a lot of money might be trading other things for money. b) people who make a lot of money might overvalue it?)

But what about everyone else,

The (case for the) Baumol effect argues differently - you have to pay people more to work in other industries, even if they haven't experienced growth, to pay them (closer to) the (new) opportunity cost. This is (supposedly) why prices go up (and why everything is so expensive).

However, morally-directed, aesthetically-oriented, protectively-focused, and corporate-structure-maintaining organizations benefit society in ways that are harder to quantify or monetize. Therefore, less money is made.

This is why there are other business models. (Libraries and parks, NGOs, non-profits - these (supposedly) aren't run the same way as corporations per say.)

It would be interesting if these/theories could somehow be tested.

A number of my comments (above) were about possible things to control for.

Comment by pattern on Steelmanning Divination · 2019-06-06T20:15:23.633Z · score: 1 (1 votes) · LW · GW

You can guess. You can roll the die yourself (and guess that it came up the same way). You can also examine the die, and then guess.

If I throw an additional dice it doesn't help determine what already thrown dice are. Your expectation doesn't shift so no probability can shift.

Also, this contains some assumptions that aren't always correct. I can throw a die a bunch of times, and notice that it comes up "6" or "1" an awful lot an conclude it's weighted. (A shift in expectation.)

Comment by pattern on Pattern's Shortform Feed · 2019-06-06T19:50:13.459Z · score: 0 (0 votes) · LW · GW

Comment by pattern on Pattern's Shortform Feed · 2019-06-06T19:49:52.276Z · score: 0 (0 votes) · LW · GW

My own (long?):

Comment by pattern on Pattern's Shortform Feed · 2019-06-06T19:49:33.745Z · score: 2 (1 votes) · LW · GW

Since bookmarking comments hasn't been implemented yet, I think I'll put them here.

(Without votes so they don't clog up space on recent discussion.)

Also, comments on these might better go on the page where they are.

Comment by pattern on Deceptive Alignment · 2019-06-06T18:15:56.634Z · score: 7 (3 votes) · LW · GW

And wow, this turned out longer than I thought it would. It's in 6 sections:

1. Starting with models versus learning models.

2. Is the third conditions for deceptive alignment necessary?

3. An alternative to, or form of, treacherous turn: Building a successor.

4. Time management: How deceptive alignment might be not be a lot more computationally expensive, and why treacherous turns might have a time delay.

5. The model of a distributional shift, and it's relationship to the model of training followed by deployment.

6. Miscellaneous


The mesa-optimizer must be able to model the fact that it is being selected to achieve a particular base objective and must have some model of what that objective is.[6]

I'd distinguish "have" in the sense of it starting out with this information from "learning" where it 'realizes' this. (If it's was all about agent generation, then an aligned agent might be simpler because it's generated with the base objective as it's objective while an unaligned agent is longer/more complex because it's generated with the base objective and it's own objective.*)


If the third condition is not satisfied, then the mesa-optimizer will not expect any benefit from temporarily acting in accordance with the base objective.

Here's a toy model: Let M denote the mesa objective, and B denote the base objective. Let U denote how the mesa-optimizer values things. U(M) > U(X) for all X such that X != M. Now suppose the mesa-optimizer, rather than just selecting taking the action A that maximizes U(A), instead wishes to maximize the utility of the sum of all it's actions, and instead chooses the action A that maximizes U(A)*T(A) where T(A) is how long it will run/expect to run for if it takes action A. Now the mesa-optimizer may expect a benefit from doing B instead of M if: U(B)*T(B) > U(M)*T(M). Note that while, by definition (assuming B != M) U(M) > U(B), the action A that maximizes U(A)*T(A) may be B rather that M, even if the the third condition:

The mesa-optimizer must expect the threat of modification[7] to eventually go away, either due to training ending or because of actions taken by the mesa-optimizer.

is not met.

I.E. if it expects to not be modified (as soon) if it does B rather than M, then it may pursue B if it likes doing B for longer, more than it likes doing M for a shorter period of time.


Pure deception. If the mesa-optimizer stops trying to optimize its own objective in the short term and focuses on cooperating with the selection process entirely then this may result in its objective “crystallizing.”

Aside from the appeals of self-modification, planning to build a better version of itself at a later date also seems like a concern, although it would probably be specific to certain possible use-cases in the real world. (It's unclear how a self-driving car would pull this off physically, software aside.)


Deceptive and corrigible alignment might be more computationally expensive. In the case of deceptive alignment, simultaneously working on optimizing the base objective and working on identifying good opportunities to defect is likely to require more computation than just doing the former.

Depends on how much it's doing it. If we assume it has a memory:

An important special case is when the mesa-optimizer has access to cross-episodic state. Such state could be used as memory,

It could delay that, or do some sort of threading. i.e. if it thinks training is going to last a long time (there's a sort of "doomsday" type argument I could make that I should expect to live to twice my current age, whatever that is. On similar grounds, a mesa-optimizer might suppose that training will end when it's age has doubled.) then, rather than looking for opportunities all the time it might put that on the backburner, or devote less processing to it (while things stay the same). This suggests that a mesa-optimizer might a) execute a treacherous turn later in it's deployment rather than right away, though later may be in terms of it's time, and if it's given more resources/compute. we might not notice a difference or b) it might 'slow down'/use more compute to look for such opportunities when things change, or if situations match desiderata it has specified for what a good treacherous opportunity might look like.


Thus, a deceptively aligned mesa-optimizer will be looking for ways to determine whether or not it is in training—that is, for the presence of a distributional shift.

And here I thought it'd be an end to/a shift in modification. While we'd expect that, I'm unclear on how a mesa-optimizer would know to expect that.

This model** assumes distributional shift (when transitioning from training to deployment). This assumes: 1) that training is not followed by a (noticeable) verification phase before deployment, 2) one consistent distribution in training (no phases).

An example of phases might be

For example, training the system in a simulated environment and deploying it in the real world would likely involve enough detectable distributional shift to make it infeasible under this approach.

a series of simulated environments, each more complex than the last, followed by the real world (or real Dota or something).

Basically, it's worth asking how a given plan might be different from (or more specific than) just "training followed by deployment", and thinking about what effects that might have.**


Minor errata (in the last footnote):

incentivized to cause there [to] be a

*Also, we're putting a lot of stuff together in our notion of "objective". It might be useful to unpack the how the parts/complexity of an objective relates to (inner) alignment. For example, we want self-driving cars to not crash, but if someone crosses the road and they can both avoid hitting the person and avoid crashing (say, by stopping quickly), we want them to do that. There's both a task (drive from point A to B, stay on the road (don't crash)) and safety (don't run into people/(big) things crossing the road).

**What about continuous deployment? Windows wasn't just released into the wild, the OS has had updates. Most products seem to work like this. (Self-driving cars might be different, I can see why they wouldn't need updates as often as iphones.) I don't know a lot about Siri and Alexa, but they keep changing them to add new features and such.

Comment by pattern on Steelmanning Divination · 2019-06-06T17:20:53.348Z · score: 0 (2 votes) · LW · GW

The question is, does mixing up a deck of flash cards randomly help with memorizing them?

EDIT: I was actually serious, it's an empirical question.

Comment by pattern on Steelmanning Divination · 2019-06-06T17:18:23.534Z · score: 2 (2 votes) · LW · GW

How does battle outcome relate to barometric pressure, and the movement of air pretty high up?

Comment by pattern on Steelmanning Divination · 2019-06-06T17:16:39.371Z · score: 1 (1 votes) · LW · GW

Someone rolls a die, and writes down the result. How do you guess what they rolled?

Comment by pattern on Steelmanning Divination · 2019-06-06T17:15:15.590Z · score: 6 (4 votes) · LW · GW

1. This post addressed that - pair your RNG with an advice table.

2. That's because you don't give meaning to "numbers". Try a random word/sentence/advice generator.

Comment by pattern on Steelmanning Divination · 2019-06-06T17:13:51.156Z · score: -1 (2 votes) · LW · GW

Because it's output is a number, as opposed to information (in word form)? Or because there's not reason the (P)RNG would be correlated to the solution to the problem you wish to solve/what you want information about?

Comment by pattern on Can movement from Conflict to Mistake theorist be facilitated effectively? · 2019-06-05T21:26:44.460Z · score: 1 (1 votes) · LW · GW

Talk is cheap - politics also includes people making plenty of statements they're not willing to physically fight over.

Comment by pattern on FB/Discord Style Reacts · 2019-06-05T19:45:22.938Z · score: 2 (2 votes) · LW · GW

A vision of hell.

Comment by pattern on Yes Requires the Possibility of No · 2019-06-05T19:38:36.991Z · score: 1 (1 votes) · LW · GW

10 is vague, and lacks examples. (Is it the Sorites paradox?)

11 is great. (Though it does raise the question - if you can only see upvotes minus downvotes, how do you know whether a score of 1 indicates no one cared, or everyone cared and were split both ways?)

Comment by pattern on The Inner Alignment Problem · 2019-06-04T20:29:52.859Z · score: 2 (1 votes) · LW · GW
Subprocess interdependence. There are some reasons to believe that there might be more initial optimization pressure towards proxy aligned than robustly aligned mesa-optimizers. In a local optimization process, each parameter of the learned algorithm (e.g. the parameter vector of a neuron) is adjusted to locally improve the base objective conditional on the other parameters. Thus, the benefit for the base optimizer of developing a new subprocess will likely depend on what other subprocesses the learned algorithm currently implements. Therefore, even if some subprocess would be very beneficial if combined with many other subprocesses, the base optimizer may not select for it until the subprocesses it depends on are sufficiently developed. As a result, a local optimization process would likely result in subprocesses that have fewer dependencies being developed before those with more dependencies.

On the one hand, this makes it sound like, instead of creating new (neurons? sets of neurons?) existing neurons are likely to be re-used. Whereas One pixel attack for fooling deep neural networks, almost seems to ask "are subprocesses with lots of dependencies* ever made?"

*High(er) level processes.

Comment by pattern on An1lam's Short Form Feed · 2019-06-04T20:20:29.749Z · score: 3 (2 votes) · LW · GW

It seems like a useful idea on a lot of levels.

There's a difference between solving a problem where you're 1) trying to figure out what to do. 2) Executing an algorithm. 3) Evaluating a closed form solution (Plugging the values into the equation, performing the operations, and seeing what the number is.)***

Names. If you're writing a program, and you decide to give things (including functions/methods) names like the letters of the alphabet it's hard for other people to understand what you're doing. Including future you. As a math enthusiast I see the benefit of not having to generate names*, but teaching wise? I can see some benefits of merging/mixing. (What's sigma notation? It's a for loop.)

Functions. You can say f' is the derivative of f. Or you can get into the fact that there are functions** that take other functions as arguments. You can focus narrowly on functions of one-variable. Or you can notice that + is a function that takes two numbers (just like *, /, ^).

*Like when your idea of what you're doing /with something changes as you go and there's no refactoring tool on paper to change the names all at the last minute. (Though paper feels pretty nice to work with. That technology is really ergonomic.)

**And that the word function has more than one meaning. There's a bit of a difference between a way of calculating something and a lookup table.

***Also, seeing how things generalize can be easier with tools that can automatically check if the changes you've made have broken what you were making. (Writing tests.)

Comment by pattern on Habryka's Shortform Feed · 2019-06-04T19:43:05.566Z · score: 3 (2 votes) · LW · GW

There's a lot of focus on personally updating based on evidence. Groups aren't addressed as much. What does it mean for a group to have a belief? To have honesty or integrity?

Comment by pattern on Agents dissolved in coffee · 2019-06-04T19:01:24.615Z · score: 2 (2 votes) · LW · GW
I don't know yet, but I will [keep] it in mind.

Minor errata.

Comment by pattern on How is Solomonoff induction calculated in practice? · 2019-06-04T18:48:53.292Z · score: 5 (3 votes) · LW · GW

AIXI is related. Being based on Solomonoff induction, it is also incomputable. However, there's AIXItl which is an approximation (with memory and time contraints), and it's okay at something like first person pac-man, so there probably is an approximation to Solomonoff induction. I don't know how useful it is, and I've never seen it used.

Comment by pattern on How is Solomonoff induction calculated in practice? · 2019-06-04T18:44:34.362Z · score: 4 (3 votes) · LW · GW

First of all, this sounds weird - Solomonoff induction isn't about scoring two hypothesis. It includes something which does (something like) that, and then it does that for all possible programs/hypothesis (which is one reason why people are saying it's uncomputable*) and then it has a universal prior, and as it gets new evidence it updates that (infinite) prior probability distribution (which is the second reason why it's uncomputable*).

*a.k.a.: this takes literally forever.

Comment by pattern on FB/Discord Style Reacts · 2019-06-03T22:42:26.499Z · score: 2 (1 votes) · LW · GW
I think something LessWrong needs to do is [make] nuanced critiques easier to chunk.
Comment by pattern on Moral Mazes and Short Termism · 2019-06-03T22:19:11.632Z · score: 2 (1 votes) · LW · GW
To use another example, if I was pathologically dishonest, I would prefer doing business with honest people rather than others like me. I'd certainly prefer honest dedicated subordinates to scheming backstabbing ones.

I'm not sure if the metaphor fits (and this is fictional evidence), but Ocean's Eleven (or whatever number it is now) is about a team of thieves. They all have an eye on the common goal. It would not make sense for them to recruit non-thieves to the team because a "honest law abiding person" might turn them into the police.

When they're doing business (robbing people) they prefer non-thieves, who are easier to rob. But when they're putting together a team for a heist, they prefer thieves. (This seems to make sense game theory wise.)

Comment by pattern on What is the evidence for productivity benefits of weightlifting? · 2019-06-03T22:09:31.446Z · score: 6 (3 votes) · LW · GW

I think this answer would be better as a comment.

Comment by pattern on Conditions for Mesa-Optimization · 2019-06-03T18:57:16.255Z · score: 2 (1 votes) · LW · GW

That was helpful, thank you.

Comment by pattern on Conditions for Mesa-Optimization · 2019-06-03T02:15:26.204Z · score: 1 (2 votes) · LW · GW

Is a mesa-optimizer unaligned by definition?

Comment by pattern on Welcome and Open Thread June 2019 · 2019-06-03T00:48:25.479Z · score: 2 (1 votes) · LW · GW
For a true Bayesian, information would never have negative expected utility.

That's because they already have it (in a sense that we don't). They know every way any experiment could go (if not which one it will).

I understand that this means to just ask for advice, not necessarily follow it.

You have more at stake than they do. (Also watch out for if they have vested interests.)

Comment by pattern on Selection vs Control · 2019-06-02T23:18:02.695Z · score: 1 (1 votes) · LW · GW

It might be based on the fact that it produces agents.

I wasn't clear on whether these was more a control thing or a selection thing - when looking at an agent, we care about what it does on its own. But we're also interested in "evolution's future outputs".

Pattern's Shortform Feed

2019-05-30T21:21:23.726Z · score: 13 (3 votes)

[Accidental Post.]

2018-09-13T20:41:17.282Z · score: -6 (1 votes)