Posts

[Linkpost] The importance of stupidity in scientific research 2022-06-19T05:17:52.557Z
World's First Octopus Farm - Linkpost to a Linkpost 2022-02-12T23:31:57.521Z
Open & Welcome Thread December 2021 2021-12-01T19:57:32.520Z
Open & Welcome Thread - December 2020 2020-12-01T17:03:48.263Z
Pattern's Shortform Feed 2019-05-30T21:21:23.726Z

Comments

Comment by Pattern on Please don't throw your mind away · 2023-02-17T18:14:47.248Z · LW · GW

T: "Gah, I really need to write this up as a blog post. Giving an example that I'm not really into at the moment seems kind of bad. But ok, so, [[goes on a rambling five-minute monologue that starts bored and boring, until visibly excited about some random thing like the origin of writing or the implausibility of animals with toxic flesh evolving or emergent modularity enabled by gene regulatory networks or the revision theory of truth or something; see the appendix for examples]]"

Still reading the rest of this.

 

"Playful Thinking" (Curiosity driven exploration) may be serendipitous for other stuff you're doing, but there isn't a guarantee. Pursuing it because you want to might help you learn things. Overall it is (or can be) one of the ways you take care of yourself.

A focus only on value which is measurable in one way can miss important things. That doesn't mean that way should be ignored, but by taking more ways into account, you might get a more complete picture. If in the future we have better ways of measuring important things, and you want to work on that, maybe that could make a big difference.

 

Overall, 'I have to be able to justify why I'm working on this' seems like the wrong approach. This is definitely the case for how you spend ALL of your time. This becoming the default isn't justified. ('Your desire to learn will help you learn.' 'That sounds inefficient.' 'What?')

Comment by Pattern on In Defense of Chatbot Romance · 2023-02-16T22:10:47.524Z · LW · GW

It might be a browser compatibility issue?

This should be spoilered. I typed it, and didn't copy paste it. 

Comment by Pattern on In Defense of Chatbot Romance · 2023-02-16T22:08:53.612Z · LW · GW

This seems like it might be useful to post to that subreddit.

Comment by Pattern on [S] D&D.Sci: All the D8a. Allllllll of it. · 2023-02-16T21:38:29.653Z · LW · GW

What are people using to load and analyze the data?

Comment by Pattern on Human-AI collaborative writing · 2023-02-16T20:12:44.946Z · LW · GW

Is your writing online anywhere?

Comment by Pattern on If instead of giving out dividends, public companies bought total market index funds, which companies would be the biggest? · 2022-12-10T00:51:38.113Z · LW · GW

Usually posts with: "D&D.Sci" in the title.

Comment by Pattern on Challenges to Yudkowsky's Pronoun Reform Proposal · 2022-12-04T02:38:50.667Z · LW · GW

As a speaker of a native language that has only genderneutral pronouns and no gendered ones, I often stumble and misgender people out of disregard of that info because that is just not how referring works in my brain. I suspect that natives don't have this property and the self-reports are about them.

What language is this?

Comment by Pattern on Did ChatGPT just gaslight me? · 2022-12-04T02:17:20.188Z · LW · GW

It reminds me of a move made in a lawsuit.

Comment by Pattern on Did ChatGPT just gaslight me? · 2022-12-04T02:16:10.608Z · LW · GW

But you said that I should use orange juice as a replacement because it's similarly sweet.

Does ChatGPT think tequila is sweet, orange juice is bitter...or is it just trying to sell you drinks?*

tequila has a relatively low alcohol content

Relative to what ChatGPT drinks no doubt.

And tequila doesn’t have any sugar at all.

*Peer pressure you into it drinking it maybe.

At best this might describe some drinks that have tequila in them. Does it know the difference between "tequila" and "drinks with tequila"?

 

Does ChatGPT not differentiate between sweet and sugar, or is ChatGPT just an online bot that improvises everything, and gaslights you when it's called on it? It keeps insisting:

..."I was simply pointing out that both orange juice and tequila can help to balance out the flavors of the other ingredients in the drink, and that both can add a nice level of sweetness to the finished beverage."...

Does someone want to try the two recipes out and compare them?

Comment by Pattern on COVID-19 Group Testing Post-mortem? · 2022-10-06T18:42:19.694Z · LW · GW

these success stories seem to boil down to just buying time, which is a good deal less impressive.

The counterpart to 'faster vaccination approval' is 'buying time' though. (Whether or not it ends up being well used, it is good at the time. The other reason to focus on it is - how much can you affect pool testing versus vaccination approval speed? Other stuff like improving statistical techniques might be easier for a lot of people than changing a specific organization.

Comment by Pattern on Bruce Wayne and the Cost of Inaction · 2022-10-06T18:31:02.229Z · LW · GW

Overall this was pretty good.

 

That night, Bruce dreamt of being a bat, of swooping in to save his parents. He dreamt of freedom, and of justice, and of purity. He dreamt of being whole. He dreamt of swooping in to protect Alfred, and Oscar, and Rachel, and all of the other good people he knew.

The part about "purity" didn't make sense.

 

Bruce would act.

This is bit of a change from before - something more about the mistake seems like it would make more sense. Not worry. ('Bruce would get it right this time' or something about 'Bruce would act (and it would make things better this time)'.) 'Bruce wouldn't be afraid' maybe?

Comment by Pattern on Ars D&D.sci: Mysteries of Mana · 2022-07-14T01:59:01.079Z · LW · GW

I was thinking

The rules don't change over time, but what if on...the equivalent of the summer solstice, fire spells get +1 fire mana or something. i.e, periodic behavior. Wait, I misread that. I meant more like, rules might be different, say, once every hundred years (anniversary of something important) - like there's more duels that day, so you might have to fight multiple opponents, or something. 

This is a place where people might look at the game flux, and go 'the rules don't change'. 

Comment by Pattern on Looking back on my alignment PhD · 2022-07-09T19:31:50.658Z · LW · GW

Our world is so inadequate that seminal psychology experiments are described in mangled, misleading ways. Inadequacy abounds, and status only weakly tracks adequacy. Even if the high-status person belongs to your in-group. Even if all your smart friends are nodding along.

It says he started with the belief. Not, that he was right, or ended with it. Keeping the idea contained to the source, so it's clear it's not being stated could be improved, yes.

Comment by Pattern on Where I agree and disagree with Eliezer · 2022-07-09T18:32:14.703Z · LW · GW

This is what would happen if you were magically given an extraordinarily powerful AI and then failed to aligned it,

Magically given a very powerful, unaligned, AI. (This 'the utility function is in code, in one place, and can be changed' assumption needs re-examination. Even if we assert it exists in there*, it might be hard to change in, say, a NN.)

* Maybe this is overgeneralizing from people, but what reason do we have to think an 'AI' will be really good at figuring out its utility function (so it can make changes without changing it, if it so desires). The postulate 'it will be able to improve itself, so eventually it'll be able to figure everything out (including how to do that)', seems to ignore things like 'improvements might make it more complex and harder to do that while improving.' Where and how do you distinguish between 'this is my utility function' and 'this is a bias I have'? (How have you improved this, and your introspecting abilities? How would a NN do either of those?)

 

One important factor seems to be that Eliezer often imagines scenarios in which AI systems avoid making major technical contributions, or revealing the extent of their capabilities, because they are lying in wait to cause trouble later. But if we are constantly training AI systems to do things that look impressive, then SGD will be aggressively selecting against any AI systems who don’t do impressive-looking stuff. So by the time we have AI systems who can develop molecular nanotech, we will definitely have had systems that did something slightly-less-impressive-looking.

Now there's an idea: due to competition, AIs do impressive things (which aren't necessarily safe). An AI creates the last advance that when implemented causes a FOOM + bad stuff.

Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research),

This doesn't necessarily require the above to be right or wrong - human level contributions (which aren't safe) could, worst case scenario...etc.

 

[6.] Many of the “pivotal acts”

(Added the 6 back in when it disappeared while copying and pasting it here.)

There's a joke about a philosopher king somewhere in there. (Ah, if only we had, an AI powerful enough to save us from AI, but still controlled by...)

 

I think Eliezer is probably wrong about how useful AI systems will become, including for tasks like AI alignment, before it is catastrophically dangerous.

I think others (or maybe the OP previously?) have pointed out that AI can affect the world in big ways way before 'taking it over'.  Domain limited, or 'sub-/on par with/super-' 'human performance', doesn't necessarily matter which of those it is (though more power -> more effect is the expectation). Some domains are big.

Comment by Pattern on Ars D&D.sci: Mysteries of Mana · 2022-07-09T18:16:55.732Z · LW · GW

Spoilering/hiding questions. Interesting.

Do the rules of the wizards' duels change depending on the date?

I'll aim to post the ruleset and results on July 18th (giving one week and both weekends for players).  If you find yourself wanting extra time, comment below and I can push this deadline back.

The dataset might not have enough info for this/rules might not be deep enough, but a wizards duel between analysts, or 'players', also sounds like it could be fun.

Comment by Pattern on Open & Welcome Thread - July 2022 · 2022-07-09T02:01:36.322Z · LW · GW

I think that is a flaw of comments, relative to 'google docs'. Long documents without the referenced areas being tagged in comments, might make it hard to find other people asking the same question you did, even if someone wondered about the same section. (And the difficulty of ascertaining that quickly seems unfortunate.)

Comment by Pattern on Open & Welcome Thread - July 2022 · 2022-07-09T01:57:55.758Z · LW · GW

It also possesses the ability to levitate and travel through solid objects. 

How is it contained?

Comment by Pattern on Feature proposal: Close comment as resolved · 2022-07-01T16:26:58.245Z · LW · GW

It's still a trivial inconvenience sometimes, but:

Two tabs:

one for the response comment writing as reading

one for the reading

 

Note, sometimes people downvote typo comments. Doesn't happen often, but, sometimes it seems like, when the author fixes it, it happens?

Comment by Pattern on Most Functions Have Undesirable Global Extrema · 2022-06-30T21:58:57.919Z · LW · GW

For example, if our function measures the probability that some particular glass is filled with water, the space near the maximum is full of worlds like “take over the galaxy and find the location least likely to be affected by astronomical phenomena, then build a megastructure around the glass designed to keep it full of water”.

If the function is 'fill it and see it is filled forever' then strange things may be required to accomplish that (to us) strange goal.

 

Idea:

Don’t specify our goals to AI using functions.

Flaw:

Current deep learning methods use functions to measure error, and AI learns by minimizing that error in an environment of training data. This has replaced the old paradigm of symbolic AI, which didn’t work very well. If progress continues in this direction, the first powerful AI will operate on the principles of deep learning.

Even if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does, assuming present trends hold. Building weaker, safer AI doesn’t stop others from building stronger, less safe AI.

Do you have any idea how to do "Don’t specify our goals to AI using functions."? How are you judging "if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does"?

 

Idea:

Get multiple AIs to prevent each other from maximizing their goal functions.

Flaw:

The global maximum of any set of functions like this still doesn’t include human civilization. Either a single AI will win, or some subset will compete among themselves with just as little regard for preserving humanity as the single AI would have.

Maybe this list should be numbered.

 

This one is worse than it looks (though it seems underspecified). Goal 1: some notion of human flourishing. Goal 2: prevent goal 1 from being maximized. (If this is the opposite of 1, you may have just asked to be nuked.)

 

Idea:

Don’t build powerful AI.

Flaw:

For all the 'a plan that handles filling a glass of water, generated using time t' 'is flawed' - this could actually work. Now, one might object that a particular entity will try to create powerful AI. While there might be incentives to do so, trying to set limits, or see safeguard deployed (if the AI managing air conditioning isn't part of your AGI research, add these safeguards now). 

This isn't meant as a pure 'this will solve the problem' approach, but that doesn't mean it might not work (thus ensuring AIs handling cooling/whatever at data centers meet certain criteria). 

 

Once it exists, powerful AI is likely to be much easier to generate or copy than historical examples of dangerous technologies like nuclear weapons.

There's a number of assumptions here which may be correct, but are worth pointing out.

How big a file do you think an AI is?

1 MB?

1 TB?

That's not to say that compression exists, but also, what hardware can run this program/software you are imagining (and how fast)?

 

undesirable worlds near the global maximum.

There's a lot of stuff in here about maximums. It seems like your beliefs that 'functions won't do' stems from a belief that maximization is occurring. Maximizing a function isn't always easy, even at the level of 'find the maximum of this function mathematically'. That's not to say that what you're saying is necessarily wrong, but suppose some goal is 'find out how this protein folds'. It might be a solvable problem, but that doesn't mean it is an easy problem. It also seems like, if the goal is to fill a glass with water, then the goal is achieved when the glass is filled with water.

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T20:40:32.917Z · LW · GW

Yeah. When something is very unclear, it's like

Is it good or bad? It's impossible to decipher, I can't tell. Is it true or false? No way to tell. (It doesn't happen often, but it's usually downvoted.)

 

ETA: I'm not sure at the moment what other aspects there are.

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T19:23:27.585Z · LW · GW

It didn't state that explicitly re sorting, but looking at:

It has no other direct effects on the site or content visibility.

I see what you mean. (This would have been less of a question in a 'magic-less sorting system'.)

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T19:19:49.247Z · LW · GW

Agree/Disagree are weird when evaluating your comment.

Agree with you asking the question (it's the right question to ask) or disagree with your view?

 

I read Duncan's comment as requesting that the labeling of the buttons be more explicit in some way, though I wasn't sure if it was your way. (Also Duncan disagreeing with what they reflect).

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T19:17:11.669Z · LW · GW

Upvote (Like**)

  • Quality*

Agreement (Truth)

  • Veracity

Not present***: Value?Judgement? (Good/bad)

  • Good/Bad

 

**This is in ()s because it's the word that shows up in bold when hovering over a button.

*How well something is written?

***That is a harsher bold than I was going for.

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T19:11:55.730Z · LW · GW

I think some aspects of 'voting' might benefit from being public. 'Novelty' is one of them. (My first thought when you said 'can't be downvoted' was 'why?'. My filtering desires for this might be...complex. The simple feature being:

I want to be able to sort by novelty. (But also be able to toggle 'remove things I've read from the list'. A toggle, because I might want it to be convenient to revisit (some) 'novel' ideas.))

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T19:06:29.147Z · LW · GW

you should also have to be thinking about

Consider replacing this long phrase (above) with 'consider'.

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T19:04:15.496Z · LW · GW

Upvoting/downvoting self

  • Sorting importance

'Agreeing'/'Disagreeing'

  • 'I have discovered that this (post (of mine)) is wrong in important ways'
  • or
  • Looking back, this has still stood the test of time.

These methods aren't necessarily very effective (here).

 

Arguably, this can be done better by:

Having them be public (likely in text). What you think of your work is also important. ('This is wrong. I'm leaving it up, but also see this post explaining where I went wrong, etc.')

 

See the top of this article for an example: https://www.gwern.net/Fake-Journal-Club 

⁠certainty: log ⁠importance: 4

Comment by Pattern on LessWrong Has Agree/Disagree Voting On All New Comment Threads · 2022-06-30T18:57:14.905Z · LW · GW

How do sorting algorithms (for comments) work now?

Comment by Pattern on Steam · 2022-06-22T14:25:45.655Z · LW · GW

For companies, this is something like the R&D budget. I have heard that construction companies have very little or no R&D. This suggests that construction is a "background assumption" of our society.


Or that research is happening elsewhere. Our society might not give it as much focus as it could though.

Comment by Pattern on Steam · 2022-06-22T14:24:15.189Z · LW · GW

 In the context of quantilization, we apply limited steam to projects to protect ourselves from Goodhart. "Full steam" is classically rational, but we do not always want that. We might even conjecture that we never want that. 

So you never do anything with your full strength, because getting results is bad?

Well, by 'we' you mean both 'you' and 'a thing you are designing with quantilization'.

Comment by Pattern on Instrumental Convergence For Realistic Agent Objectives · 2022-06-21T22:20:28.704Z · LW · GW

It seems to me that in a competitive, 2-player, minimize-resource-competition StarCraft, you would want to go kill your opponent so that they could no longer interfere with your resource loss?

I would say that in general it's more about what your opponent is doing. If you are trying to lose resources and the other player is trying to lose them, you're going to get along fine. (This would be likely be very stable  and common if players can kill units and scavenge them for parts.) If both of you are trying to lose them...

Trying to minimize resources is a weird objective for StarCraft. As is gain resources. Normally it's a means to an end - destroying the other player first. Now, if both sides start out with a lot of resources and the goal is to hit zero first...how do you interfere with resource loss? If you destroy the other player don't their resources go to zero? Easy to construct, by far, is 'losing StarCraft'. And I'm not sure how you'd force a win.

 

This starts to get into 'is this true for Minecraft' and...it doesn't seem like there's conflict of the 'what if they destroy me, so I should destroy them from' kind, so much as 'hey stop stealing my stuff!'. Also, death isn't permanent, so... There's not a lot of non-lethal options. If a world is finite (and there's enough time) eventually, yeah, there could be conflict.

 

More generally, I think competitions to minimize resources might still usually involve some sort of power-seeking.

In the real world maybe I'd be concerned with self nuking. Also starting a fight, and stuff like that - to ensure destruction - could work very well.

Comment by Pattern on Dagger of Detect Evil · 2022-06-21T21:13:21.025Z · LW · GW

stab

You assume they're dead. (It gives you a past measurement - no guarantee someone won't become evil later.)

Comment by Pattern on Dagger of Detect Evil · 2022-06-21T21:12:10.243Z · LW · GW

Okay, but no testing it on yourself, or anyone else you don't want dead. You'd be lucky to lose only a finger, or a hand.

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:40:28.992Z · LW · GW

It's a shame we can't see the disagree number and the agree number, instead of their sum.

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:37:58.016Z · LW · GW

So far as I know, every principle of this kind, except for Jessica Taylor's "quantilization", and "myopia" (not sure who correctly named this as a corrigibility principle), was invented by myself; eg "low impact", "shutdownability".  (Though I don't particularly think it hopeful if you claim that somebody else has publication priority on "low impact" or whatevs, in some stretched or even nonstretched way; ideas on the level of "low impact" have always seemed cheap to me to propose, harder to solve before the world ends.)

Low impact seems so easy to propose I doubt OP is the first.

 

I believe paulfchristiano has already raised this point, but at what level of 'principles' are being called for?

Myopia seems meant as a means to achieve shutdownability/modifiability.

Likewise for quanilization/TurnTrout's work, on how to achieve low impact.

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:34:19.668Z · LW · GW

3. AI which ultimately wants to not exist in future as a terminal goal. Fulfilling the task is on the simplest trajectory to non-existence
 

The first part of that sounds like it might self destruct. And if it doesn't care about anything else...that could go badly. Maybe nuclear badly depending... The second part makes it make more sense though.

 

9. Ontological uncertainty about level of simulation.

So it stops being trustworthy if it figures out it's not in a simulation? Or, it is being simulated?

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:31:28.633Z · LW · GW

Also:

Being able to change a system after you've built it.

 

(This also refers to something else - being able to change the code. Like, is it hard to understand? Are there modules? etc.)

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:24:58.365Z · LW · GW

I think those are just two principles, not just four.

 

Myopia seems like it includes/leads to 'shutdownability', and some other things.

Low impact: How low? Quantilization is meant as a form of adjustable impact. There's been other work* around this (formalizing power/affecting other's ability to achieve their goals).

*Like this, by TurnTrout: https://www.lesswrong.com/posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure

I think there might be more from TurnTrout, or relating to that. (Like stuff that was intended to explain it 'better' or as the ideas changed as people worked on them more.)

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:22:24.524Z · LW · GW

I would set up a "council" of AGI-systems (a system of systems), and when giving it requests in an oracle/genie-like manner I would see if the answers converged. At first it would be the initial AGI-system, but I would use that system to generate new systems to the "council".
 

I like this idea. Although, if things don't converge, i.e. there is disagreement, this could potentially serve as identifying information that is needed to proceed, or reckon further/efficiently.

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:19:34.215Z · LW · GW

Votes aren't public. (Feedback can be.)

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:18:17.330Z · LW · GW

-Tell operators anything about yourself they may want to or should know. 

...

but of course explain what you think the result will be to them

Possible issue: They won't have time to listen. This will limit the ability to:

defer to human operators.

 

Also, does defer to human operators take priority over 'humans must understand consequences'?

Comment by Pattern on Let's See You Write That Corrigibility Tag · 2022-06-20T20:13:54.551Z · LW · GW

better than the tag overall

What does this mean? Improve on what you've (the OP has) already written that's here (LW) tagged corrigibility?

 

The overall point make sense, see how far you can go on:

'principles for corrigbility'.

The phrasing at the end of the post was a little weird though.

Comment by Pattern on Humans are very reliable agents · 2022-06-20T20:05:06.995Z · LW · GW

We rarely recognize how extraordinary it is that so many people can go entire lifetimes neither realizing how often, all that is necessary to steal famous artworks, is to find a way to use a window as a door, or grab a drywall knife and make a door, nor having to perform surgery on themselves.

Comment by Pattern on [Linkpost] The importance of stupidity in scientific research · 2022-06-19T05:34:47.682Z · LW · GW

The article is short enough - One page! - you should read it instead of the description that follows. One thing I appreciate about is that it covers just a subject, briefly, and does so well.

I'm not sure if I have the right to copy the article over, so I didn't. I came across a screenshot of it online, and looked up the source above.

 

This article is about how feeling stupid is a sign of ignorance, but it's something that happens when you're learning (e.g grad+), especially when you're working on projects to find out things that no else has yet. (e.g. PhD.)

 

At first I thought that on lesswrong, if someone was writing something like this, they'd probably make up some new words, or title it like: "The Feeling of Ignorance". I looked up the definition of stupidity to see what I could find, and have pasted some results below. "Following" denotes which line I chose to follow (a part of) by looking up the definition (of a word from that line).

Before getting to that, I am first pasting what I wrote after that:

I would also note that 'lacking good judgement' might be how someone might characterize themselves having been in hindsight, when they are no longer ignorant. This seems unavoidable when no one has the necessary knowledge.

 

I think the title as is, is a part of the piece addressing an issue, and have not altered it for that reason.

-

stupidity
stoo͞-pĭd′ĭ-tē, styoo͞-
noun
The quality or condition of being stupid.
A stupid act, remark, or idea.
A state of stupor or stupefaction; torpidity of feeling or of mind.

Following 1:

stupid
stoo͞′pĭd, styoo͞′-
adjective
Slow to learn or understand; obtuse.
Tending to make poor decisions or careless mistakes.
Marked by a lack of intelligence or care; foolish or careless.

Following 3:

foolish
foo͞′lĭsh
adjective
Lacking or exhibiting a lack of good sense or judgment; silly.
Capable of arousing laughter; absurd or ridiculous.
Embarrassed; abashed.

Following 3:

embarrassed
adjective
feeling uneasily or unpleasantly self-conscious due to some event or circumstance.
feeling inferior or unworthy and hence unpleasantly self-conscious.
Having a feeling of shameful discomfort.

 

ETA:

  • What is now the second sentence.
  • "" around Following. Search "Following" to find that.
Comment by Pattern on AGI Ruin: A List of Lethalities · 2022-06-09T03:56:31.241Z · LW · GW

So, again, you end up needing alignment to generalize way out of the training distribution

I assume this is 'you need alignment if you are going to try 'generalize way out of the training distribution and give it a lot of power'' (or you will die).

And not something else like 'it must stay 'aligned' - and not wirehead itself - to pull something like this off, even though it's never done that before'. (And thus 'you need alignment to do X', not because you will die if you do, but because alignment means something like 'the ability to generalize way out of the training distribution, and not, it's 'safe'* even though it's doing that.)

*Safety being hard to define in a technical way, such that the definition can provide safety. (Sort of.)

... This happens in practice in real life, it is what happened in the only case we know about, and it seems to me that there are deep theoretical reasons to expect it to happen again: the first semi-outer-aligned solutions found, in the search ordering of a real-world bounded optimization process, are not inner-aligned solutions. This is sufficient on its own, even ignoring many other items on this list, to trash entire categories of naive alignment proposals which assume that if you optimize a bunch on a loss function calculated using some simple concept, you get perfect inner alignment on that concept.

Are there examples of inner-aligned solutions? (It seems I'm not up to date on this.)

Comment by Pattern on Why do some people try to make AGI? · 2022-06-06T20:08:30.933Z · LW · GW

'This problem seems hard. Perhaps making AI that's generally good, and then having the AI do it would be easier.'

Comment by Pattern on The case for using the term 'steelmanning' instead of 'principle of charity' · 2022-06-04T00:10:49.759Z · LW · GW

lack of charity is a flaw of curiosity,

What?

Comment by Pattern on Shortform · 2022-05-30T21:39:11.836Z · LW · GW

What theorem?

Comment by Pattern on Distributed Decisions · 2022-05-30T21:34:31.862Z · LW · GW

How technical is the use of the word 'distributed' here?

 

While arranging my evening, I may perform some Bayesian updates. Maybe I learn that the movie is not available on Netflix, so I ask a friend if they have a copy, then check Amazon when they don’t. This process is reasonably well-characterized as me having a centralized model of the places I might find the movie, and then Bayes-updating that model each time I learn another place where I can/can’t find it.

It seems more like going through a list of places and checking off 'not there' than Bayesian updating. Sure, that's a special case, 

My friends and I, as a system, are not well-modeled as Bayesian updates to a single central knowledge-state; otherwise we wouldn’t check Netflix twice.

but it seems like 'centrality' is less likely to be the thing here than something else. Coordination is mentioned, but it seems more like you both check Netflix because you're not asking 'what if _ checks Netflix'. In other words, maybe you're not acting in a 'Bayesian manner'. Rather than evaluate the probability, you take the action. I would also guess you didn't say Netflix because 'the probability points that way'.

If you watch Netflix a lot (or have used it recently) then it might come to mind quickly. If your friend watches something else a lot, maybe they check there first.

 

There's not much of a benefit of more elaborate protocols here (beyond texting your friend it's not on netflix), if there's not a lot of services to search. (Otherwise you could come up with a list together (or independently) and handle your parts (or pick some off the list at random, figuring that if both of you do that, you're more likely to find it, even if you don't coordinate more).) So I won't go into a lot more detail here, other than mentioning:

There are other considerations at play here than probability: cost. You have Netflix so you check there.

Comment by Pattern on Benign Boundary Violations · 2022-05-27T17:04:58.949Z · LW · GW

1. Yeah, this is tricky. I didn't like the terminology, but I didn't have a replacement. It's hard to come up with a term for this (for reasons discussed at length in the post). I was looking more at 'both are 'boundaries'' and disambiguating that it is your boundary (versus the social one) that you are sort of opting in/asking others to work with you to define. (Opting-in (by self) to boundary exploration (of self by others).) 'Boundary exploration' still doesn't sound good, though 'boundary violation' sounds worse. Emphasizing the opt-in part in the terminology seems helpful, given that it's what you want is a surprise, hence it not being 'someone asks for permission to push you in the pool'.

 

1/2. It seems clear that what you want would involve people asking someone other than the person being surprised. (Like planning a surprise party, or 'Friend A throws Friend B into the pool in order to splash Friend C during a water fight/similar game'.)

 

2. Yeah, aside from the issue over all (surprising seems hard to scale)...You were mostly talking about other things, but it kind of sounded like you wanted a surprise party. (Or to be surprised by, not it, but what would happen there.) That seems like it could be

  • hard to do with a party. 
  • Very dependent on stuff like where you are (versus talking about an abstract topic on LW). (Like, is the weather good enough that, your friends don't tell you where the party will be, and the day of, they surprise you by*...going to the beach. Or some other place that's fun for a group, and it's a surprise.)

*associated details might include, your eyes are covered or closed until you get there etc.

This is a narrower topic than 'how to handle/negotiate fitting the personal bounds rather than the other one, which is being treated in this post as serving a different purpose', so I didn't focus on it more.

 

3. That makes sense.

Comment by Pattern on Benign Boundary Violations · 2022-05-26T23:04:00.145Z · LW · GW

(Prompt:)

The important part would be:

1. The post communicates its point but the terminology could be better. (Which is probably why there are so many "hedges".)
 

Less important:

2. In order to scale up, some things do require opt in/advance notice. Some possibilities are (largely) exclusive of each other. (A costume party and a surprise water balloon fight.)
3. The post mentions different subcultures have different rules, but talks about society boundaries like they are one thing only.

 

(Purpose:)

Overall, I made notes as I read the post. (This post is fairly straightforward and didn't need lots of re-reads to understand, but it is kind of long. More complex and long occasionally go together, so I made notes as I went. It's also useful for more formed thoughts and has a few quotes or points I could go back and re-read, instead of having to skim the whole thing to get back to.)