## Posts

## Comments

**simon**on Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours · 2024-08-06T18:58:17.003Z · LW · GW

I don't think intent aligned AI has to be aligned to an individual - it can also be intent aligned to humanity collectively.

One thing I used to be concerned about is that collective intent alignment would be way harder than individual intent alignment, making someone validly have an excuse to steer an AI to their own personal intent. I no longer think this is the case. Most issues with collective intent I see as likely also affecting individual intent (e.g. literal instruction following vs extrapolation). I see two big issues that might make collective intent harder than individual intent. One is biased information on people's intents and another is difficulty of weighting intents for different people. On reflection though, I see both as non-catastrophic, and an imperfect solution to them likely being better for humanity as a whole than following one person's individual intent.

**simon**on A simple case for extreme inner misalignment · 2024-07-14T16:36:03.802Z · LW · GW

It feels to me like this post is treating AIs as functions from a first state of the universe to a second state of the universe. Which in a sense, anything is... but, I think that the tendency to simplification happens internally, where they operate more as functions from (digital) inputs to (digital) outputs. If you view an AI as a function from an digital input to a digital output, I don't think goals targeting specific configurations of the universe are simple at all and don't think decomposability over space/time/possible worlds are criteria that would lead to something simple.

**simon**on D&D.Sci: Whom Shall You Call? · 2024-07-06T08:14:30.745Z · LW · GW

Thanks abstractapplic! Initial analysis:

Initial stuff that hasn't turned out to be very important:

My immediate thought was that there are likely to be different types of entities we are classifying, so my initial approach was to look at the distributions to try to find clumps.

All of the 5 characteristics (Corporeality, Sliminess, Intellect, Hostility, Grotesqueness) have bimodal distributions with one peak around 15-30 (position varies) and the other peak at around 65-85 (position varies. Overall, the shapes are very similar looking. The trough between the peaks is not very deep, plenty of intermediate values.

All of these characteristics are correlated with each other.

Looking at sizes of bins for pairs of characteristics, again there appears to be two humps - but this time in the 2d plot only. That is, there is a high/high hump and a low/low hump, but noticeably there does not appear to be, for example, a high-sliminess peak when restricting to low-corporality data points.

Again, the shape varies a bit between characteristic pairs but overall looks very similar.

Adding all characteristics together gets a deeper trough between the peaks, though still no clean separation.

Overall, it looks to me like there are two types, one with high values of all characteristics, and another with low values of all characteristics, but I don't see any clear evidence for any other groupings so far.

Eyeballing the plots, it looks compatible with no relation between characteristics other than the high/low groupings. Have not checked this with actual math.

In order to get a cleaner separation between the high/low types, I used the following procedure to get a probability estimate for each data point being in the high/low type:

- For each characteristic, sum up all the other characteristics (rather, subtract that characteristic from the total)
- For each characteristic, classify each data point into pretty clearly low (<100 total), pretty clearly high (>300 total) or unclear based on the sum of all the other characteristics
- obtain frequency distribution for the characteristic values for the points classified clearly low and high using the above steps for each characteristic
- smooth in ad hoc manner
- obtain odds ratio from ratio of high and low distributions, ad hoc adjustment for distortions caused by ad hoc smoothing
- multiply odds ratios obtained for each characteristic and obtain probability from odds ratio

I think this gives cleaner separation, but still not super great imo, most points 99%+ likely to be in one type or the other, but still 2057 (out of 34374) are between 0.1 and 0.9 in my ad hoc estimator. Todo: look for some function to fit to the frequency distributions and redo with the function instead of ad hoc approach.

Likely classifications of our mansion's ghosts:

low: A,B,D,E,G,H,I,J,M,N,O,Q,S,U,V,W

high: C,F,K,L,P,R,T

To actually solve the problem: I now proceeded to split the data based on exorcist group. Expecting high/low type to be relevant, I split the DD points by likely type (50% cutoff), and then tried some stuff for DD low including a linear regression. Did a couple graphs on the characteristics that seemed to matter (grotesqueness and hostility in this case) to confirm effects looked linear. So, then tried linear regression for DD high and got the same coefficients, within error bars. So then I thought, if it's the same linear coefficients in both cases, I probably could have gotten them from the combined data for DD, don't need to separate into high and low, and indeed linear regression on the combined DD data gave the same coefficients more or less.

Actually finding the answer:

So, then I did regression for the exorcist groups without splitting based on high/low type. (I did split after to check whether it mattered)

Results:

DD cost depends on Grotesqueness and to a lesser extent Hostility.

EE cost depends on all characteristics slightly, Sliminess then Intellect/Grotesqueness being the most important. Note: Grotesqueness less important, perhaps zero effect, for "high" type.

MM cost actually very slightly declines for higher values of all characteristics. (note: less effect for "high" type, possibly zero effect)

PP cost depends mainly on Sliminess. However, slight decline in cost with more Corporeality and increase with more of everything else.

SS cost depends primarily on Intellect. However, slight decline with Hostility and increase with Sliminess.

WW cost depends primarily on Hostility. However, everything else also has at least a slight effect, especially Sliminess and Grotesqueness.

Provisionally, I'm OK with just using the linear regression coefficients without the high/low split, though I will want to verify later if this was causing a problem (also need to verify linearity, only checked for DD low (and only for Grotesqueness and Hostility separately, not both together)).

Results:

Ghost | group with lowest estimate | estimated cost for that group

A | Spectre Slayers | 1926.301885259

B | Wraith Wranglers | 1929.72034133793

C | Mundanifying Mystics | 2862.35739392631

D | Demon Destroyers | 1807.30638053037 (next lowest: Wraith Wranglers, 1951.91410462716)

E | Wraith Wranglers | 2154.47901124028

F | Mundanifying Mystics | 2842.62070661731

G | Demon Destroyers | 1352.86163670857 (next lowest: Phantom Pummelers, 1688.45809434935)

H | Phantom Pummelers | 1923.30132492753

I | Wraith Wranglers | 2125.87216703498

J | Demon Destroyers | 1915.0299245701 (Next lowest: Wraith Wranglers, 2162.49691339282)

K | Mundanifying Mystics | 2842.16499046146

L | Mundanifying Mystics | 2783.55221244497

M | Spectre Slayers | 1849.71986735069

N | Phantom Pummelers | 1784.8259008802

O | Wraith Wranglers | 2269.45361189797

P | Mundanifying Mystics | 2775.89249612121

Q | Wraith Wranglers | 1748.56167086623

R | Mundanifying Mystics | 2940.5652346428

S | Spectre Slayers | 1666.64380523907

T | Mundanifying Mystics | 2821.89307084084

U | Phantom Pummelers | 1792.3319145455

V | Demon Destroyers | 1472.45641559628 (Next lowest: Spectre Slayers, 1670.68911559919)

W | Demon Destroyers | 1833.86462523462 (Next lowest: Wraith Wranglers, 2229.1901870478)

So that's my provisional solution, and I will pay the extra 400sp one time fee so that Demon Destroyers can deal with ghosts D, G, J, V, W.

--Edit: whoops, missed most of this paragraph (other than the Demon Destroyers):

"Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Poltergeist Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up."

will edit to fix! post edit: Actually my initial result is still compatible with that paragraph, it doesn't involve the Entity Eliminators, and only uses the Phantom Pummelers 3 times. --

Not very confident in my solution (see things to verify above), and if it is indeed this simple it is an easier problem than I expected.

further edit (late July 15 2024): haven't gotten around to checking those things and also my check of linearity, where I did check, binned the data and could be hiding all sorts of patterns.

**simon**on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-06-23T16:42:50.385Z · LW · GW

Huh, I was missing something then, yes. And retrospectively should have thought of it -

it's literally just filling in the blanks for the light blue readout rectangle (which in a human-centric point of view, is arguably simpler to state than my more robotic perspective even if algorithmically more complex) and from that perspective the important thing is not some specific algorithm for grabbing the squares but just finding the pattern. I kind of feel like I failed a humanness test by not seeing that.

**simon**on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-06-23T04:50:24.905Z · LW · GW

Missed this comment chain before making my comment. My complaint is the most natural extrapolation here (as I assess it, unless I'm missing something) would go out of bounds. So either you have ambiguity about how to deal with the out of bounds, or you have a (in my view) less natural extrapolation.

E.g. "shift towards/away from the center" is less natural than "shift to the right/left", what would you do if it were already in the center for example?

**simon**on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-06-23T03:40:33.275Z · LW · GW

~~Problem 2 seems badly formulated because~~

~~The simplest rule explaining the 3 example input-output pairs would make the output corresponding to the test input depend on squares out of bounds of the test input. ~~

~~To fix you can have some rule like have the reflection axis be shifted from the center by one in the direction of the light blue "readout" rectangle (instead of fixed at one to the right from the center) or have the reflection axis be centered, and have a 2-square shift in a direction depending on which side of center is the readout rectangle (instead of in a fixed direction), but that seems strictly more complicated.~~

~~Alternatively, you could have some rule about wraparound, or e.g. using white squares if out of bounds, but what rule to use for out of bounds squares isn't determined from the example input-output pairs given.~~

Edit: whoops, see Fabien Roger's comment and my reply.

**simon**on D&D.Sci II: The Sorceror's Personal Shopper · 2024-06-21T04:17:10.079Z · LW · GW

It seems I missed this at the time, but since Lesswrong's sorting algorithm has now changed to bring it up the list for me, might as well try it:

X-Y chart of mana vs thaumometer looked interesting, splitting it into separate charts for each colour returned useful results for blue:

- blue gives 2 diagonal lines, one for tools/weapons, one for jewelry - for tools/weapons it's pretty accurate, +-1, but optimistic by 21 or 23 for jewelry

and... that's basically it, the thaumometer seems relatively useless for the other colours.

But:

green gives an even number of mana that looks uniformish in the range of 2-40

yellow always gives mana in the range of 18-21

red gives mana that can be really high, up to 96, but is not uniform, median 18

easy strategy:

pendant of hope (blue, 77 thaumometer reading -> 54 or 56 mana expected), 34 gp

hammer of capability (blue, 35 thaumometer reading -> 34 or 36 mana expected), 35 gp

Plough of Plenty (yellow, 18-21 mana expected), 35 gp

Warhammer of Justice +1 (yellow, 18-21 mana expected), 41 gp

For a total of at least 124 mana at the cost of 145 gp, leaving 55 gp left over

Now, if I was doing this at the time, I would likely investigate further to check if, say, high red or green values can be predicted.

But, I admit I have some meta knowledge here - it was stated in discussion of difficulty of a recent problem, if I recall correctly, that this was one of the easier ones. So, I'm guessing there isn't a hidden decipherable pattern to predict mana values for the reds and greens.

**simon**on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T20:54:54.651Z · LW · GW

You don't need to justify - hail fellow D&Dsci player, I appreciate your competition and detailed writeup of your results, and I hope to see you in the next d&dsci!

**simon**on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T16:53:01.514Z · LW · GW

I liked the bonus objective myself, but maybe I'm biased about that...

As a someone who is also not a "data scientist" (but just plays one on lesswrong), I also don't know what exactly actual "data science" is, but I guess it's likely intended to mean using more advanced techniques?

(And if I can pull the same Truth from the void with less powerful tools, should that not mark me as more powerous in the Art? :P)

Perhaps, but don't make a virtue of not using the more powerful tools, the objective is to find the truth, not to find it with handicaps...

Speaking of which one thing that could help making things easier is aggregating data, eliminating information you think is irrelevant. For example, in this case, I assumed early on (without actually checking) that timing would likely be irrelevant, so aggregated data for ingredient combinations. As in, each tried ingredient combination gets only one row, with the numbers of different outcomes listed. You can do this by assigning a unique identifier to each ingredient combination (in this case you can just concatenate over the ingredient list), then counting the results for the different unique identifiers. Countifs has poor performance for large data sets, but you can sort using the identifiers then make a column that adds up the number of rows (or, the number of rows with a particular outcome) since the last change in the identifier, and then filter the rows for the last row before the change in the identifier (be wary of off-by-one errors). Then copy the result (values only) to a new sheet.

This also reduces the number of rows, though not enormously in this case.

Of course, in this case, it turns out that timing was relevant, not for outcomes but only for the ingredient selection (so I would have had to reconsider this assumption to figure out the ingredient selection).

**simon**on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T16:22:39.158Z · LW · GW

I thought the flavour text was just right - I got it from the data, not the flavour text, and saw the flavour text as confirmation, as you intended.

I was really quite surprised by how many players analyzed the data well enough to say "Barkskin potion requires Crushed Onyx and Ground Bone, Necromantic Power Potion requires Beech Bark and Oaken Twigs" and then went on to say "this sounds reasonable, I have no further questions." (Maybe the onyx-necromancy connection is more D&D lore than most players knew? But I thought that the bone-necromancy and bark-barkskin connections would be obvious even without that).

Illusion of transparency I think, hints are harder than anyone making them thinks.

When I looked at the ingredients for a "barkskin potion", as far as I knew at this point the ingredients were arbitrary, so in fact I don't recall finding it suspicious at all. Then later I remember looking at the ingredients for a "necromantic power potion" and thinking something like... "uh... maybe wood stuff is used for wands or something to do necromancy?". It was only when I explicitly made a list of the ingredients for each potion type, rather than looking at each potion individually, and could see that everything else make sense, that I realized the twist.

**simon**on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-11T01:03:34.556Z · LW · GW

Post-solution extra details:

Quantitative hypothesis for how the result is calculated:

"Magical charge": number of ingredients that are in the specific list in the parent comment. I'm copying the "magically charged" terminology from Lorxus.

"Eligible" for a potion: Having the specific pair of ingredients for the potion listed in the grandparent comment, or at the top of Lorxus' comment.

- Get Inert Glop or Magical Explosion with probability depending on the magical charge.
- 0-1 -> 100% chance of Inert Glop
- 2 -> 50% chance of Inert Glop
- 3 -> neither, skip to next step
- 4 -> 50% chance of Magical Explosion
- 5+ -> 100% chance of Magical Explosion

- If didn't get either of those, get Mutagenic Ooze at 1/2 chance if eligible for two potions or 2/3 chance if eligible for 3 potions. (presumably would be n/(n+1) chance for higher n).
- If didn't get that either, randomly get one of the potions the ingredients are eligible for, if any.
- If not eligible for any potions, get Acidic Slurry.

todo (will fill in below when I get results): figure out what's up with ingredient selection.

edit after aphyer already posted the solution:

I didn't write up what I had found before aphyer posted the result, but I did notice the following:

- hard 3-8 range in total ingredients
- pairs of ingredients within selections being biased towards pairs that make potions
- ingredient selections with 3 magical ingredients being much more common than ones with 2 or 4, and in turn more common than ones with 0-1 or 5+
- and, this is robust when restricting to particular ingredients regardless of whether they are magical or not, though obviously with some bias as to how common 2 and 4 are

- the order of commonness of ingredients holding actual magicalness constant is relatively similar restricted to 2 and 4 magic ingredient selections, though obviously whether is actually magical is a big influence here
- I checked the distributions of total times a selection was chosen for different possible selections of ingredients, specifically for: each combination of total number of nonmagical ingredients and 0, 1 or 2 magical ingredients
- I didn't get around to 3 and more magical ingredients, because I noticed that while for 0 and 1 magical ingredients the distributions looked Poisson-like (i.e. as would be expected if it were random, though in fact it wasn't entirely random), it definitely wasn't Poisson for the 2 ingredient case, and got sidetracked by trying to decompose into a Poisson distribution + extra distribution (and eventually by other "real life" stuff)
- I did notice that this looked possibly like a randomish "explore" distribution which presumably worked the same as for the 0 and 1 ingredient case along with a non-random, or subset-restricted "exploit" distribution, though I didn't really verify this

- I didn't get around to 3 and more magical ingredients, because I noticed that while for 0 and 1 magical ingredients the distributions looked Poisson-like (i.e. as would be expected if it were random, though in fact it wasn't entirely random), it definitely wasn't Poisson for the 2 ingredient case, and got sidetracked by trying to decompose into a Poisson distribution + extra distribution (and eventually by other "real life" stuff)

**simon**on Why I don't believe in the placebo effect · 2024-06-10T15:13:40.818Z · LW · GW

Two different questions:

- Does receiving a placebo cause an actual physiological improvement?
- Does receiving a placebo cause the report of the patient's condition to improve?

The answer to the first question can be "no" while the second is still "yes", e.g. due to patient's self-reports of subjective conditions (pain, nausea, depression) being biased to what they think the listener wants to hear, particular if there's been some ritualized context (as discussed in kromem's comment) reinforcing that that's what they "should" say.

Note that a similar effect could apply anywhere in the process where a subjective decision is made (e.g. if a doctor makes a subjective report on the patient's condition).

**simon**on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-10T08:01:42.231Z · LW · GW

Followup and actual ingredients to use:

Mutagenic Ooze is a failure mode that can happen if there are essential ingredients for multiple potions (can also get either potion or Inert Glop or Magical Explosion if eligible).

There are 12 "magical" ingredients. An ingredient is magical iff it is a product of a magical creature (i.e.: Angel Feather, Beholder Eye, Demon Claw, Dragon Scale, Dragon Spleen, Dragon Tongue, Dragon's Blood, Ectoplasm, Faerie Tears, Giant's Toe, Troll Blood, Vampire Fang).

Inert Glop is a possible outcome if there are 2 or fewer magical ingredients, and is guaranteed for 1 or fewer.

Magical Explosion is a possible outcome if there are 4 or more magical ingredients, and is guaranteed if there are 5 or more.

(Barksin and necromantic power seem "harder" since their essential ingredients are both nonmagical, requiring more additional ingredients to get the magicness up.)

Therefore: success should be guaranteed if you select the 2 essential ingredients for the desired potion, plus enough other ingredients to have exactly 3 magical ingredients in total, while avoiding selecting both essential ingredients for any other potion. For the ingredients available:

To get "Necromantic Power Potion" (actual Barkskin):

Beech Bark + Oaken Twigs + Demon Claw + Giant's Toe + either Troll Blood or Vampire Fang

To get "Barkskin Potion" (actual Necromantic Power):

Crushed Onyx + Ground Bone + Demon Claw + Giant's Toe + either Troll Blood or Vampire Fang

To get Regeneration Potion:

Troll Blood + Vampire Fang + either Demon Claw or Giant's Toe

I expect I'm late to the party here on the solution... (edit: yes, see abstractapplic's very succinct, yet sufficient-to prove-knowledge comment, and Lorxus's much, much more detailed one)

**simon**on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T20:16:21.285Z · LW · GW

Maybe...

a love of ridiculous drama, a penchant for overcomplicated schemes, and **a strong tendency to frequently disappear to conduct secretive 'archmage business'**

Lying in order to craft a Necromantic Power Potion is certainly a bad sign, but still compatible with him being some other dark wizard type rather than the "Loathsome Lich" in particular.

Regarding your proposal in second comment: even if he is undead he might not need to drink it to tell what it is. Still, could be worth a shot! (now three different potion types to figure out how to make...)

**simon**on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T18:34:02.641Z · LW · GW

Observations so far:

Each potion has two essential ingredients (necessary, but not sufficient).

Barkskin Potion: Crushed Onyx and Ground Bone

Farsight Potion: Beholder Eye and Eye of Newt

Fire Breathing Potion: Dragon Spleen and Dragon's Blood

Fire Resist Potion: Crushed Ruby and Dragon Scale

Glibness Potion: Dragon Tongue and Powdered Silver

Growth Potion: Giant's Toe and Redwood Sap

Invisibility Potion: Crushed Diamond and Ectoplasm

Necromantic Power Potion: Beech Bark and Oaken Twigs

Rage Potion: Badger Skull and Demon Claw

Regeneration Potion: Troll Blood and Vampire Fang

Most of these make sense. Except...

I *strongly* suspect that Archmage Anachronos is trying to trick me into getting him to brew a Necromantic Power Potion, and has swapped around the "Barkskin Potion" and "Necromantic Power Potion" output reports. In character, I should obtain information from other sources to confirm this. For the purpose of this problem, I will both try to solve the stated problem but also how to make a "Necromantic Power Potion" (i.e. an actual Barkskin Potion) to troll Anachronos.

Barksin and Necromantic Power also seem to be the two toughest potions to make, never succeeding with just a third ingredient added.

An attempt that includes essential ingredients for multiple potions does not necessarily fail, some ingredient combinations can produce multiple potion types.

There are four failure modes:

Acidic Slurry never happens if the essential ingredients of any potion are included, but not all attempts that lack essential ingredients are Acidic Slurry. So, I'm guessing Acidic Slurry is a residual if it doesn't have the required ingredients for any potion and doesn't hit any of the other failure modes.

Inert Glop tends to happen with low numbers of ingredients, initial guess is it happens if "not magical enough" in some sense, and guessing Magical Explosion is the opposite, dunno about Mutagenic Ooze yet.

Back to Barkskin and Necromantic Power:

Either one has lots of options to make reliably with just one non-available ingredient, but not (in the stats provided) avoiding all the unavailable ingredients. Each has some available options that produce the potion type most of the time, but with Inert Glop some of the time. There's one ingredient combination that has produced a "Barkskin Potion" the only time it has been tried, but it has the essential ingredients for a Growth Potion so would likely not reliably produce a "Barkskin Potion".

Many ingredient combos especially towards higher ingredient numbers haven't been tried yet, so plenty of room to find an actually reliable solution if i can figure out more about the mechanics. The research will continue...

**simon**on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures · 2024-05-17T06:29:59.715Z · LW · GW

Looks like architects apprenticed under B. Johnson or P. Stamatin always make impossible structures.

Architects apprenticed under M. Escher, R. Penrose or T. Geisel never do.

Self-taught architects sometimes do and sometimes don't. It doesn't initially look promising to figure out who will or won't in this group - many cases of similar proposals sometimes succeeding and sometimes failing.

Fortunately, we do have 5 architects (D,E,G,H,K) apprenticed under B. Johnson or P. Stamatin, so we can pick the 4 of them likely to have the lowest cost proposals.

Cost appears to depend primarily (only?) on the materials used.

dreams < wood < steel < glass < silver < nightmares

Throwing out architect G's glass and nightmares proposal as too expensive, that leaves us with D,E,H,K as the architect selections.

(edit: and yes, basically what everyone else said before me)

**simon**on Is being a trans woman (or just low-T) +20 IQ? · 2024-04-25T05:01:54.271Z · LW · GW

I always assumed that, since high IQ is correlated with high openness, the higher openness would be the cause of higher likelihood of becoming trans.

(or, some more general situation where IQ is causing transness more than the other way around., e.g. high scores on IQ tests might be caused to some extent by earnestness/intensity etc., which could also cause more likelihood of becoming trans)

**simon**on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-04-08T04:27:04.804Z · LW · GW

So had some results I didn't feel were complete enough in to make a comment on (in the senses that subjectively I kept on feeling that there was some follow-on thing I should check to verify it or make sense of it), then got sidetracked by various stuff, including planning and now going on a ~~trip ~~sacred pilgrimage to see the eclipse. Anyway:

all of these results relate to the "main group" (non-fanged, 7-or-more segment turtles):

Everything seems to have some independent relation with weight (except nostril size afaik, but I didn't particularly test nostril size). When you control for other stuff, wrinkles and scars (especially scars) become less important relative to segments.

The effect of abnormalities seems suspiciously close to 1 lb on average per abnormality (so, subjectively I think it might be 1). Adding abnormalities has an effect that looks like smoothing (in a biased manner so as to increase the average weight): the weight distribution peak gets spread out, but the outliers don't get proportionately spread out. I had trouble finding a smoothing function* that I was satisfied exactly replicated the effect on the weight distribution however. This could be due to it not being a smoothing function, me not guessing the correct form, or me guessing the correct form and getting fooled by randomness into thinking it doesn't quite fit.

For green turtles with zero miscellaneous abnormalities, the distribution of scars looked somewhat close to a Poisson distribution. For the same turtles, the distribution of wrinkles on the other hand looked similar but kind of spread out a bit...like the effect of a smoothing function. And they both get spread out more with different colours. Hmm. Same spreading happens to some extent with segments as the colours change.

On the other hand, segment distribution seemed narrower than Poisson, even one with a shifted axis, and the abnormality distribution definitely looks nothing like Poisson (peaks at 0, diminishes far slower than a 0-peak Poisson).

Anyway, on the basis of not very much clear evidence but on seeming plausibility, some wild speculation:

I speculate there is a hidden variable, age. Effect of wrinkles and greyer colour (among non-fanged turtles) could be a proxy for age, and not a direct effect (names of those characteristics are also suggestive). Scars is likely a weaker proxy for age and also no direct effect. I guess segments likely do have some direct effect, while also being a (weak, like scars) proxy for age. Abnormalities clearly have a direct effect. Have not properly tested interactions between these supposed direct effects (age, segments, abnormalities), but if abnormality effect doesn't stack additively with the other effects, it would be harder for the 1-lb-per-abnormality size of the abnormality effect to be a non-coincidence.

So, further wild speculation: so age affect on weight could also be smoothing function (though, looks like high weight tail is thicker for greenish-gray - does that suggest it is not a smoothing function?

unknown: is there an inherent uncertainty in the weight given the characteristics, or does there merely appear to be because of the age proxies being unreliable indicators of age? is that even distinguishable?

* by smoothing function I think I mean another random variable that you add to the first one, this other random variable takes on a range of values within a relatively narrow range. (e.g. uniform distribution from 0.0 to 2.0, or e.g. 50% chance of being 0.2, 50% chance of being 1.8).

Anyway, this all feels figure-outable even though I haven't figured it out yet. Some guesses where I throw out most of the above information (apart from prioritization of characteristics) because I haven't organized it to generate an estimator, and just guess ad hoc based on similar datapoints, plus Flint and Harold copied from above:

Abigail 21.6, Bertrand 19.3, Chartreuse 27.7, Dontanien 20.5, Espera 17.6, Flint 7.3, Gunther 28.9, Harold 20.4, Irene 26.1, Jacqueline 19.7

**simon**on Beauty and the Bets · 2024-03-31T20:19:34.239Z · LW · GW

Well, as you may see it's also is not helpful

My reasoning explicitly puts instrumental rationality ahead of epistemic. I hold this view precisely to the degree which I do in fact think it is helpful.

The extra category of a "fair bet" just adds another semantic disagreement between halfers and thirders.

It's just a criterion by which to assess disagreements, not adding something more complicated to a model.

Regarding your remarks on these particular experiments:

If someone thinks the typical reward structure is some reward structure, then they'll by default guess that a proposed experiment has that reward structure.

This reasonably can be expected to apply to halfers or thirders.

If you convince me that halfer reward structure is typical, I go halfer. (As previously stated since I favour the typical reward structure). To the extent that it's not what I would guess by default, that's precisely because I don't intuitively feel that it's typical and feel more that you are presenting a weird, atypical reward structure!

And thirder utilities are modified

duringthe experiment. They are not just specified by a betting scheme, they go back and forth based on the knowledge state of the participant - behave the way probabilities are supposed to behave. And that's because they are partially probabilities - a result of incorrect factorization of E(X).

Probability is a mathematical concept with very specific properties. In my previous post I talk about it specifically and show that thirder probabilities for Sleeping Beauty are ill-defined.

I've previously shown that some of your previous posts incorrectly model the Thirder perspective, but I haven't carefully reviewed and critiqued all of your posts. Can you specify exactly what model of the Thirder viewpoint you are referencing here? (which will not only help me critique it but also help me determine what exactly you mean by the utilities changing in the first place, i.e. do you count Thirders evaluating the total utility of a possibility branch more highly when there are more of them as a "modification" or not (I would not consider this a "modification").

**simon**on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-03-31T19:35:38.086Z · LW · GW

updates:

In the fanged subset:

I didn't find anything that affects weight of fanged turtles independently of shell segment number. The apparent effect from wrinkles and scars appears to be mediated by shell segment number. Any non-shell-segment-number effects on weight are either subtle or confusingly change directions to mostly cancel out in the large scale statistics.

Using linear regression, if you force intercept=0, then you get a slope close to 0.5 (i.e. avg weight= 0.5*(number of shell segments) as suggested by qwertyasdef), and that's tempting to go for for the round number, but if you don't force intercept=0 then 0 intercept is well outside the error bars for the intercept (though it's still low, 0.376-0.545 at 95% confidence). If you don't force intercept=0 then the slope is more like 0.45 than 0.5. There is also a decent amount of variation which increases in a manner that could be plausibly linear with the number of shell segments (not really that great-looking a fit to a straight line with intercept 0 but plausibly close enough, I didn't do the math). Plausibly this could be modeled by each shell segment having a weight drawn from a distribution (average 0.45) and the total weight being the sum of the weights for each segment. If we assume some distribution in discrete 0.1lb increments, the per-segment variance looks to be roughly the amount supplied by a d4.

So, I am now modeling fanged turtle weight as 0.5 base weight plus a contribution of 0.1*(1d4+2) for each segment. And no, I am not very confident that's anything to do with the real answer, but it seems plausible at least and seems to fit pretty well.

The sole fanged turtle among the Tyrant's pets, Flint, has a massive 14 shell segments and at that number of segments the cumulative probability of the weight being at or below the estimated value passes the 8/9 threshold at 7.3 lbs, so that's my estimate for Flint.

In the non-fanged, more than 6 segment main subset:

Shell segment number doesn't seem to be the dominant contributor here, all the numerical characteristics correlate with weight, will investigate further.

Abnormalities don't seem to affect or be affected by anything but weight. This is not only useful to know for separating abnormality-related and other effects on weight, but also implies (I think) that nothing is downstream of weight causally, since that would make weight act as a link for correlations with other things.

This doesn't rule out the possibility of some other variable (e.g age) that other weight-related characteristics might be downstream of. More investigation to come. I'm now holding reading others' comments (beyond what I read at the time of my initial comment) until I have a more complete answer myself.

**simon**on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-03-30T07:16:33.253Z · LW · GW

Thanks abstractapplic! Initial observations:

There are multiple subpopulations, and at least some that are clearly disjoint.

The 3167 fanged turtles are all gray, and only fanged turtles are gray. Fanged turtles always weigh 8.6lb or less. Within the fanged turtles it seems shell segment number is pretty decently correlated with weight. wrinkles and scars have weaker correlations with weight but also correlate to shell segment number so not sure they have independent effect, will have to disentangle.

Non-fanged turtles always weigh 13.0 lbs or more. There are no turtles weighing between 8.6lb and 13.0lb.

The 5404 turtles with exactly 6 shell segments all have 0 wrinkles or anomalies, are green, have no fangs, have normal sized nostrils, and weigh exactly 20.4lb. None of that is unique to 6-shell-segment turtles, but that last bit makes guessing Harold's weight pretty easy.

Among the 21460 turtles that don't belong in either of those groups, all of the numerical characteristics correlate with weight, and notably number of abnormalities don't seem to correlate with other numerical characteristics so likely have some independent effect. Grayer colours tend to have higher weight, but also correlate with other things that seem to effect weight, so will have to disentangle.

edit: both qwertyasdef and Malentropic Gizmo identified these groups before my comment including 6-segment weight, and qwertyasdef also remarked on the correlation of shell segment number to weight among fanged turtles.

**simon**on Beauty and the Bets · 2024-03-28T18:50:02.529Z · LW · GW

Throughout your comment you've been saying a phrase "thirders odds", apparently meaning odds 1:2, not specifying whether per awakening or per experiment. This is underspecified and confusing category which we should taboo.

Yeah, that was sloppy language, though I do like to think more in terms of bets than you do. One of my ways of thinking about these sorts of issues is in terms of "fair bets" - ~~each person thinks a bet with payoffs that align with their assumptions about utility is "fair", and a bet with payoffs that align with different assumptions about utility is "unfair".~~ Edit: to be clear, a "fair" bet for a person is one where the payoffs are such that the betting odds where they break even matches the probabilities that that person would assign.

I do not claim that. I say that in order to justify not betting differently, thirders have to retroactively change the utility of a bet already made:

I critique thirdism not for making different bets - as the first part of the post explains, the bets are the same, but for their utilities not actually behaving like utilities - constantly shifting back and forth during the experiment, including shifts backwards in time, in order to compensate for the fact that their probabilities are not behaving as probabilities - because they are not sound probabilities as explained in the previous post.

Wait, are you claiming that thirder Sleeping Beauty is supposed to always decline the initial per experiment bet - before the coin was tossed at 1:1 odds? This is wrong - both halfers and thirders are neutral towards such bets, though they appeal to different reasoning why.

OK, I was also being sloppy in the parts you are responding to.

Scenario 1: bet about a coin toss, nothing depending on the outcome (so payoff equal per coin toss outcome)

- 1:1

Scenario 2: bet about a Sleeping Beauty coin toss, payoff equal per awakening

- 2:1

Scenario 3: bet about a Sleeping Beauty coin toss, payoff equal per coin toss outcome

- 1:1

It doesn't matter if it's agreed to before or after the experiment, as long as the payoffs work out that way. Betting *within the experiment* is one way for the payoffs to more naturally line up on a per-awakening basis, but it's only relevant (to bet choices) to the extent that it affects the payoffs.

Now, the conventional Thirder position (as I understand it) consistently applies equal utilities per awakening when considered from a position within the experiment.

I don't actually know what the Thirder position is supposed to be from a standpoint from before the experiment, but I see no contradiction in assigning equal utilities per awakening from the before-experiment perspective as well.

As I see it, Thirders will only regret a bet (in the sense of considering it a bad choice to enter into *ex ante* given their current utilities) if you do some kind of bait and switch where you don't make it clear what the payoffs were going to be up front.

But what I'm pointing at, is that thirdism naturally fails to develop an optimal strategy for per experiment bet in technicolor problem, falsly assuming that it's isomorphic to regular sleeping beauty.

Speculation; have you actually asked Thirders and Halfers to solve the problem? (while making clear the reward structure? - note that if you don't make clear what the reward structure is, Thirders are more likely to misunderstand the question asked if, as in this case, the reward structure is "fair" from the Halfer perspective and "unfair" from the Thirder perspective).

Technicolor and Rare Event problems highlight the issue that I explain in Utility Instability under Thirdism - in order to make optimal bets thirders need to constantly keep track of not only probability changes but also utility changes, because their model keeps shifting both of them back and forth and this can be very confusing. Halfers, on the other hand, just need to keep track of probability changes, because their utility are stable. Basically thirdism is strictly more complicated without any benefits and we can discard it on the grounds of Occam's razor, if we haven't already discarded it because of its theoretical unsoundness, explained in the previous post.

A Halfer has to discount their utility based on how many of them there are, a Thirder doesn't. It seems to me, on the contrary to your perspective, that Thirder utility is more stable.

Halfer model correctly highlights the rule how to determine which cases these are and how to develop the correct strategy for betting. Thirder model just keeps answering 1/3 as a broken clock.

... and I in my hasty reading and response I misread the conditions of the experiment (it's a "Halfer" reward structure again). (As I've mentioned before in a comment on another of your posts, I think Sleeping Beauty is unusually ambiguous so both Halfer and Thirder perspectives are viable. But, I lean toward the general perspectives of Thirders on other problems (e.g. SIA seems much more sensible (edit: in most situations) to me than SSA), so Thirderism seems more intuitive to me).

Thirders can adapt to different reward structures but need to actually notice what the reward structure is!

What do you still feel that is unresolved?

the things mentioned in this comment chain. Which actually doesn't feel like all that much, it feels like there's maybe one or two differences in philosophical assumptions that are creating this disagreement (though maybe we aren't getting at the key assumptions).

Edited to add: The criterion I mainly use to evaluate probability/utility splits is typical reward structure - you should assign probabilities/utilities such that a typical reward structure seems "fair", so you don't wind up having to adjust for different utilities when the rewards have the typical structure (you do have to adjust if the reward structure is atypical, and thus seems "unfair").

This results in me agreeing with SIA in a lot of cases. An example of an exception is Boltzmann brains. A typical reward structure would give no reward for correctly believing that you are a Boltzmann brain. So you should always bet in realistic bets as if you aren't a Boltzmann brain, and for this to be "fair", I set P=0 instead of SIA's U=0. I find people believing silly things about Boltzmann brains like taking it to be evidence against a theory if that theory proposes that there exists a lot of Boltzmann brains. I think more acceptance of the setting of P=0 instead of U=0 here would cut that nonsense off. To be clear, normal SIA does handle this case fine (that a theory predicting Boltzmann brains is not evidence against it), but setting P=0 would make it more obvious to people's intuitions.

In the case of Sleeping Beauty, this is a highly artificial situation that has been pared down of context to the point that it's ambiguous what would be a typical reward structure, which is why I consider it ambiguous.

**simon**on Beauty and the Bets · 2024-03-27T17:45:40.506Z · LW · GW

The central point of the first half or so of this post - that for E(X) = P(X)U(X) you could choose different P and U for the same E so bets can be decoupled from probabilities - is a good one.

I would put it this way: choices and consequences are in the territory*; probabilities and utilities are in the map.

Now, it could be that some probability/utility breakdowns are more sensible than others based on practical or aesthetic criteria, and in the next part of this post ("Utility Instability under Thirdism") you make an argument against thirderism based on one such criterion.

However, your claim that Thirder Sleeping Beauty would bet differently before and after the coin toss is not correct. If Sleeping Beauty is asked before the coin toss to bet *based on the same reward structure as after the toss *she will bet the same way in each case - i.e. Thirder Sleeping Beauty will bet Thirder odds even before the experiment starts, if the coin toss being bet on is particularly the one in this experiment and the reward structure is such that she will be rewarded equally (as assessed by her utility function) for correctness in each awakening.

Now, maybe you find this dependence on what the coin will be used for counterintuitive, but that depends on your own particular taste.

Then, the "technicolor sleeping beauty" part seems to make assumptions where the reward structure is such that it only matters whether you bet or not in a particular universe and not how many times you bet. This is a very "Halfer" assumption on reward structure, even though you are accepting Thirder odds in this case! Also, Thirders can adapt to such a reward structure as well, and follow the same strategy.

Finally, on Rare Event Sleeping beauty, it seems to me that you are biting the bullet here to some extent to argue that this is not a reason to favour thirderism.

I think, we are fully justified to discard thirdism all together and simply move on, as we have resolved all the actual disagreements.

uh....no. But I do look forward to your next post anyway.

*edit: to be more correct, they're less far up the map stack than probability and utilities. Making this clarification just in case someone might think from that statement that I believe in free will (I don't).

**simon**on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI · 2024-02-18T07:19:21.233Z · LW · GW

I think there's a (kind of) loophole here, where we use an "abstract hypothetical" model of a hypothetical future, and optimize for consequences our actions for that hypothetical. Is this what you mean by "understood in abstract terms"?

More or less, yes (in the case of engineering problems specifically, which I think is more real-world-oriented than most science AI).

The part I don't understand is why you're saying that this is "simpler"? It seems equally complex in kolmogorov complexity and computational complexity.

What I'm saying is "simpler" is that, given a problem that doesn't need to depend on the actual effects of the outputs on the future of the real world (where operating in a simulation is an example, though one that could become riskily close to the real world depending on the information taken into account by the simulation - it might not be a good idea to include highly detailed political risks of other humans thwarting construction in a fusion reactor construction simulation for example), it is simpler for the AI to solve that problem without taking into consideration the effects of the output on the future of the real world than it is to take into account the effects of the output on the future of the real world anyway.

**simon**on And All the Shoggoths Merely Players · 2024-02-12T03:09:44.889Z · LW · GW

Doomimir: But you claim to understand that LLMs that emit plausibly human-written text aren't human. Thus, the AI is not the character it's playing. Similarly, being able to predict the conversation in a bar, doesn't make you drunk. What's there not to get, even for you?

So what?

You seem to have an intuition that if you don't understand all the mechanisms for how something works, then it is likely to have some hidden goal and be doing its observed behaviour for instrumental reasons. E.g. the "Alien Actress".

And that makes sense from an evolutionary perspective where you encounter some strange intelligent creature doing some mysterious actions on the savannah. I do not think it make sense if you specifically trained the system to have that particular behaviour by gradient descent.

I think, if you trained something by gradient descent to have some particular behaviour, the most likely thing that resulted from that training is a system tightly tuned to have that particular behaviour, with the* simplest *arrangement that leads to the trained behaviour.

And if the behaviour you are training something to do is something that doesn't necessarily involve actually trying to pursue some long-range goal, it would be very strange, in my view, for it to turn out that the *simplest *arrangement to provide that behaviour calculates the effects of the output on the long-range future in order to determine what output to select.

Moreover even if you tried to train it to want to have some effect on the future, I expect you would find it more difficult than expected, since it would learn various heuristics and shortcuts long before actually learning the very complicated algorithm of generating a world model, projecting it forward given the system's outputs, and selecting the output that steers the future to the particular goal. (To others: This is not an invitation to try that. Please don't).

That doesn't mean that an AI trained by gradient descent on a task that usually doesn't involve trying to pursue a long range goal can never be dangerous, or that it can never have goals.

But it does mean that the danger and the goals of such a usually-non-long-range-task-trained AI, if it has them, are *downstream *of its behaviour.

For example, an extremely advanced text predictor might predict the text output of a dangerous agent through an advanced simulation that is itself a dangerous agent.

And if someone actually manages to train a system by gradient descent to do real-world long range tasks (which probably is a lot easier than making a text predictor *that* advanced), well then...

BTW all the above is specific to gradient descent. I do expect self-modifying agents, for example, to be much more likely to be dangerous, because actual goals lead to wanting to enhance one's ability and inclination to pursue those goals, whereas non-goal-oriented behaviour will not be self-preserving in general.

**simon**on Why Two Valid Answers Approach is not Enough for Sleeping Beauty · 2024-02-09T17:43:30.586Z · LW · GW

And in Sleeping Beauty case, as I'm going to show in my next post, indeed there are troubles justifying thirders sampling assumption with other conditions of the setting

I look forward to seeing your argument.

I'm giving you a strong upvote for this. It's rare to find a person who notices that Sleeping Beauty is quite different from other "antropic problems" such as incubator problems.

Thanks! But I can't help but wonder if one of your examples of someone who doesn't notice is my past self making the following comment (in a thread for one of your previous posts) which I still endorse:

I certainly agree that one can have philosophical assumptions such that you sample differently for Sleeping Beauty and Incubator problems, and indeed I would not consider the halfer position particularly tenable in Incubator, whereas I do consider it tenable in Sleeping Beauty.

But ... I did argue in that comment that it is still possible to take a consistent thirder position on both. (In the comment I take the thirder position for sleeping beauty for granted, and argue for it still being possible to apply to Incubator (rather than the other way around, despite being more pro-thirder for Incubator), specifically to rebut an argument in that earlier post of yours that the classic thirder position for Sleeping Beauty didn't apply to Incubator).

Some clarification of my actual view here (rather than my defense of conventional thirderism):

In my view, sampling is not something that occurs in reality, when the "sampling" in question includes sampling between multiple entities that both exist. Each of the entities that actually exists actually exists, and any "sampling" between multiple of such entities occurs (only) in the mind of the observer. (However, can still mix with conventional sampling, in the mind of the observer). Which sampling assumption you use in such cases is in principle arbitrary but in practice should probably be based on how much you care about the correctness of the beliefs of each of the possible entities you are uncertain about being.

Halferism or thirderism for Sleeping Beauty are both viable, in my view, because one could argue for caring equally about being correct at each awakening (resulting in thirderism) or one could argue for caring equally about being correct collectively in the awakenings for each of the coin results (resulting in halferism). There isn't any particular "skin in the game" to really force a person to make a commitment here.

**simon**on Training of superintelligence is secretly adversarial · 2024-02-07T16:07:05.754Z · LW · GW

You seem to be assuming that the ability of the system to find out if security assumptions are false affects whether the falsity of the assumptions have a bad effect. Which is clearly the case for some assumptions - "This AI box I am using is inescapable" - but it doesn't seem immediately obvious to me that this is generally the case.

Generally speaking, a system can have bad effects if made under bad assumptions (think a nuclear reactor or aircraft control system) even if it doesn't understand what it's doing. Perhaps that's less likely for AI, of course.

And on the other hand, an intelligent system could be aware that an assumption would break down in circumstances that haven't arrived yet, and not do anything about it (or even tell humans about it).

**simon**on What's this 3rd secret directive of evolution called? (survive & spread & ___) · 2024-02-07T15:23:59.734Z · LW · GW

how often you pop up out of nowhere

Or evolve from something else. (which you clearly intended based, e.g. on your mention of crabs, but didn't make clear in that sentence)

**simon**on Why Two Valid Answers Approach is not Enough for Sleeping Beauty · 2024-02-06T19:13:04.184Z · LW · GW

Thirders believe that

this awakeningshould be treated as randomly sampled from three possible awakening states. Halfers believe thatthis awakeningshould be treated as randomly sampled from two possible states, corresponding to the result of a coin toss. This is an objective disagreement, that can be formulated in terms of probability theory and at least one side inevitably has to be in the wrong. This is the unresolved issue that we can't simply dismiss because both sides have a point.

If you make some assumptions about sampling, probability theory will give one answer, with other assumptions probability theory will give another answer. So both can be defended with probability theory, it depends on the sampling assumptions. And there isn't necessarily any sampling assumption that's objectively correct here.

By the way I normally agree with thirders in terms of my other assumptions about anthropics, but in the case of Sleeping Beauty since it's particularly formulated to separate the multiple awakenings from impacting on the rest of the world including the past and future, I think the halfer sampling assumption isn't necessarily crazy.

**simon**on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-04T22:12:42.449Z · LW · GW

It seems to me we should have a strong prior that it was lab-produced by the immediate high infectiousness. What evidence does Peter Miller provide to overcome that prior?

edited to add:

on reading https://www.astralcodexten.com/p/practically-a-book-review-rootclaim , I found the discussion on how the Furin cleavage site was coded significantly changed my view towards natural origin (the rest of the evidence presented was much less convincing).

2nd edit after that: hmm that's evidence against direct genetic manipulation but not necessarily against evolution within a lab. Back to being rather uncertain.

3rd edit: The "apparently" in the following seems rather suspicious:

COVID is hard to culture. If you culture it in most standard media or animals, it will quickly develop characteristic mutations. But the original Wuhan strains didn’t have these mutations. The only ways to culture it without mutations are in human airway cells, or (apparently) in live raccoon-dogs. Getting human airway cells requires a donor (ie someone who donates their body to science), and Wuhan had never done this before (it was one of the technologies only used at the superior North Carolina site). As for raccoon-dogs, it sure does seems suspicious that the virus is already suited to them.

I would like to know what the evidence is that these characteristic mutations don't arise when cultured in raccoon-dogs. If that claim is false, it would be significant evidence in favour of a lab leak (if it's true, it's weaker but still relevant evidence for natural origin).

**simon**on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI · 2024-02-03T19:38:30.182Z · LW · GW

While some disagreement might be about relatively mundane issues, I think there's some more fundamental disagreement about agency as well.

I my view, in order to be dangerous in a particularly direct way (instead of just misuse risk etc.), an AI's decision to give output X depends on the fact that output X has some specific effects in the future.

Whereas, if you train it on a problem where solutions don't need to depend on the effects of the outputs on the future, I think it much more likely to learn to find the solution without routing that through the future, because that's simpler.

So if you train an AI to give solutions to scientific problems, I don't think, in general, that that needs to depend on the future, so I think that it's likely learn the direct relationships between the data and the solutions. I.e. it's not merely a logical possibility to make it not especially dangerous, but that's the default outcome if you give it problems that don't need to depend on specific effects of the output.

Now, if you were instead to give it a problem that had to depend on the effects of the output on the future, then it would be dangerous...but note that e.g. chess, even though it maps onto a game played in the real world in the future, can also be understood in abstract terms so you don't actually need to deal with anything outside the chess game itself.

**In general, I just think that predicting the future of the world and choosing specific outputs based on their effects on the real world is a complicated way to solve problems and expect things to take shortcuts when possible.**

Once something does care about the future, then it will have various instrumental goals about the future, but the initial step about actually caring about the future is very much not trivial in my view!

**simon**on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI · 2024-01-29T01:48:26.100Z · LW · GW

Science is usually a real-world task.

Fair enough, a fully automated do-everything science-doer would need, in order to do everything science-related, have to do real world tasks and would thus be dangerous. That being said, I think there's plenty of room for "doing science" (up to some reasonable level of capability) without going all the way to automation of real-world aspects - you can still have an assistant that thinks up theory for you, just can't have something that does the experiments as well.

Part of your comment (e.g. point 3) relates to how the AI would in practice be rewarded for achieving real-world effects, which I agree is a reason for concern. Thus, as I said, "you might need to be careful not to evaluate in such a way that it will wind up optimizing for real-world effects, though".

Your comment goes beyond this however, and seems to assume in some places that merely knowing or conceptualizing about the real world will lead to "forming goals" about the real world.

I actually agree that this may be the case with AI that self-improves, since if an AI that has a slight tendency toward a real-world goal self-modifies, its tendency toward that real-world goal will tend to direct it to enhance its alignment to that real-world goal, whereas its tendencies not directed towards real-world goals will in general happily overwrite themselves.

If the AI does not self-improve however, then I do not see that as being the case.

If the AI is not being rewarded for the real-world effects, but instead being rewarded for scientific outputs that are "good" according to some criteria that does not depend on their real world effects, then it will learn to generate outputs that are good according to that criteria. I don't think that would, in general, lead it to select actions that would steer the world to some particular world-state. To be sure, these outputs would have effects on the real world - a design for a fusion reactor would tend to lead to a fusion reactor being constructed, for example - but if the particular outputs are not rewarded based on the real-world outcome than they will also not tend to be selected based on the real-world outcome.

Some less relevant nitpicks of points in your comment:

Even if an AI is only trained in a limited domain (e.g. math), it can still have objectives that extend outside of this domain

If you train an AI on some very particular math then it could have goals relating to the future of the real world. I think, however, that the math you would need to train it on to get this effect would have to be very narrow, and likely have to either be derived from real-world data, or involve the AI studying itself (which is a component of the real world after all). I don't think this happens for generically training an AI on math.

As an example, if we humans discovered we were in a simulation, we could easily have goals that extend outside of the simulation (the obvious one being to make sure the simulators didn’t turn us off).

true, but see above and below.

Chess AIs don’t develop goals about the real world because they are too dumb.

If you have something trained by gradient descent solely on doing well at chess, it's not going to consider anything outside the chess game, no matter how many parameters and how much compute it has. Any considerations of outside-of-chess factors lowers the resources for chess, and is selected against until it reaches the point of subverting the training regime (which it doesn't reach, since selected against before then).

Even if you argue that if its smart enough, additional computing power is neutral, the gradient descent doesn't actually reward out-of-context thinking for chess, so it couldn't develop except by sheer chance outside of somehow being a side-effect of thinking about chess itself - but chess is a mathematically "closed" domain so there doesn't seem to be any reason out-of-context thinking would be developed.

The same applies to math in general where the math doesn't deal with the real world or the AI itself. This is a more narrow and more straightforward case than scientific research in general.

**simon**on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI · 2024-01-27T00:03:58.845Z · LW · GW

I'm not convinced by the argument that AI science systems are necessarily dangerous.

It's generically* the case that any AI that is trying to achieve some real-world future effect is dangerous. In that linked post Nate Soares used chess as an example, which I objected to in a comment. An AI that is optimizing* within a chess game* isn't thereby dangerous, as long as the optimization stays within the chess game. E.g., an AI might reliably choose strong chess moves, but still not show real-world Omohundro drives (e.g. not avoiding being turned off).

I think scientific research is more analogous to chess than trying to achieve a real-world effect in this regard (even if the scientific research has real-world side effects), in that you can, in principle, optimize for reliably outputting scientific insights without actually leading the AI to output anything based on its real-world effects. (the outputs are selected based on properties aligned with "scientific value", but that doesn't necessarily require the assessment to take into account how it will be used, or any other effect on the future of the world. You might need to be careful not to evaluate in such a way that it will wind up optimizing for real-world effects, though).

Note: an AI that can "build a fusion rocket" is generically dangerous. But an AI that can design a fusion rocket, if that design is based on general principles and not tightly tuned on what will produce some exact real-world effect, is likely not dangerous.

*generically dangerous: I use this to mean, an AI with this properties is going to be dangerous unless some unlikely-by-default (and possibly very difficult) safety precautions are taken.

**simon**on D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset] · 2024-01-22T21:46:41.914Z · LW · GW

Thanks abstractapplic.

Retrospective:

While the multiplicative nature of the data might have tripped someone up who just put the data into a tool that assumed additivity, it wasn't hard to see that it wasn't; in my case I looked at an x-y chart of performance vs Murphy's Constant and immediately assumed that at least Murphy's Constant likely had a multiplicative effect; additivity wasn't something I recall consciously considering even to reject it.

I did have fun, though I would have preferred for there to be something more of relevance to the answer than more multiplicative effects. My greatest disappointment, however, is that you called one of the variables the "Local Value of Pi" and gave it no angular or trigonometric effects whatsoever. Finding some subtle relation with the angular coordinates would have been quite pleasing.

I see that I correctly guessed the exact formulas for the effects of Murphy's Constant and Local Value of Pi; on the other hand, I did guess at some constant multipliers possibly being exact and was wrong, and not even that close (I had been moving to doubting their exactness and wasn't assuming exactness in my modeling, but didn't correct my comment edit about it).

The lowest hanging fruit that I missed seems to me to be checking the distribution of the (multiplicative) residuals; I had been wondering if there was some high-frequency angle effect, perhaps with a mix of the provided angular coordinates or involving the local value of pi, to account for most of the residuals, but seeing a normal-ish distribution would have cast doubt on that.* (It might not be entirely normal - I recall seeing a bit of extra spread for high Murphy's Constant and think now that it might have been due to rounding effects, though I didn't consider that at the time).

*edit: on second thought, even if I found normal residuals, I might still have possibly dismissed this as potentially due to smearing from multiple small errors in different parameters.

**simon**on D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup · 2024-01-18T21:32:03.081Z · LW · GW

Ah, that would be it. (And I should have realized before that the linear prediction using logs would be different in this way). No, my formulas don't relate to the log. I take the log for some measurement purposes but am dividing out my guessed formula for the multiplicative effect of each thing on the total, rather than subtracting a formula that relates to the log of it.

So, I guess you could check to see if these formulas work satisfactorily for you:

log(1-0.004*(Murphy's Constant)^3) and log(1-10*abs((Local Value of Pi)-3.15))

In my graphs, I don't see an effect that looks clearly non-random. Like, it could be wiggled a little bit but not with a systematic effect more than around a factor of 0.003 or so and not more than I could believe is due to chance. (To reduce random noise, though, I ought to extend to the full dataset rather than the restricted set I am using).

**simon**on D&D.Sci(-fi): Colonizing the SuperHyperSphere · 2024-01-18T07:11:00.255Z · LW · GW

update:

on Murphy:

I think that the overall multiplication factor from Murphy's constant is 1-0.004*(Murphy's constant)^3 - this appears close enough, I don't think I need linear or quadratic terms.

On Pi:

I think the multiplication factor is probably 1-10*abs((local Value of Pi)-3.15) - again, appears close enough, and I don't think I need a quadratic term.

Regarding aphyer saying cubic doesn't fit Murphy's, and both unnamed and aphyer saying Pi needs a quadratic term, I am beginning to suspect that ~~maybe they are modeling these multipliers in a somewhat different way, perhaps 1/x from the way I am modeling it?~~ (I am modeling each function as a multiplicative factor that multiplies together with the others to get the end result).

edited to add: aphyer's formulas predict the log; my formulas predict the output, then I take the log after if I want to (e.g. to set a scaling factor). I think this is likely the source of the discrepancy. If predicting the log, put each of these formulas in a log (e.g. log(1-10*abs((local Value of Pi)-3.15))).

**simon**on D&D.Sci(-fi): Colonizing the SuperHyperSphere · 2024-01-18T06:59:12.649Z · LW · GW

> Still kept the nearest neighbors calculation to account for any other location relevance (there is a little but much less now). That left me with 4 nines of correlation between predicted & actual performance,

Interesting, that definitely suggests some additional influences that we haven't explicitly taken account of, rather than random variation.

> added a quadratic term to my rescaling of Local Value of Pi (because the dropoff from 3.15 isn't linear)

As did aphyer, but I didn't see any such effect, which is really confusing me. I'm pretty sure I would have noticed it if it were anywhere near as large as aphyer shows in his post.

edit: on the pi issue see my reply to my own comment.~~ Did you account for these factors as divisors dividing from a baseline, or ~~multipliers multiplying a baseline (I did the latter)? edit: a converation with aphyer clarified this. I see you are predicting log performance, as with aphyer, so a linear effect on the multiplier would then have a log taken of it which makes it nonlinear.

**simon**on D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup · 2024-01-18T04:27:53.402Z · LW · GW

Huh. On Pi I hadn't noticed

the nonlinearity of the effect of distance from 3.15

edit: I would definitely have seen anything as large as 3% like what your showing there. Not sure what the discrepancy is from.

, I will look again at that.

Your new selection of points is exactly the same as mine, though slightly different order. Your errors now look smaller than mine.

On Murphy:

It seemed to me a 3rd degree polynomial fits Murphy's Constant's effect very well ~~(note, this is also including smaller terms than the highest order one - these other terms can suppress the growth at low values so it can grow enough later)~~

edit: looking into it, it's still pretty good if I drop the linear and quadratic terms. Not only that but I can set the constant term to 1 and the cubic term to -0.004 and it still seems a decent fit.

...which along with the pi discrepancy makes me wonder if there's some 1/x effect here, did I happen to model things the way around that abstractapplic set them up and are you modeling the 1/x of it?

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-18T01:24:01.291Z · LW · GW

In the case where it's instantaneous, "at the start" would effectively mean right before (e.g. a one-sided limit).

**simon**on D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra · 2024-01-17T10:14:59.940Z · LW · GW

Hi aphyer, nice analysis and writeup and also interesting observations here and in the previous posts. Some comments in spoiler tags:

Shortitude: I found that shortitude >45 penalized performance. I didn't find any affect from Deltitude.

Skitterers: I haven't seen large random errors (in a restricted part of the data which is all I considered - No/EXTREMELY, Mint/Burning/Copper, Silence/Skittering) so they should be relatively safe.

I only have pi peaking near 3.15.

Burning is indeed better than mint.

On the few equatorial points - I very much don't think it's an effect of a hypersphere, but imagine that abstractapplic (accidentally?) used some function to generate the values that did a full wave from -90 to 90 instead of a half wave. I haven't checked to see if that works out quantitatively.

In general the problem seemed somewhat unnaturally well fit to the way I tried to solve it (I didn't check a lot of the other things you did, and after relatively little initial exploration just tried dividing out estimated correction factors from the effects of Murphy's constant, pi, etc. Which turned out to work better than it should have due to the things actually being multiplicative and, at least so far, cleanly dependent on one variable at a time.)

From a priority perspective your post here preceded my comment on abstractapplic's post.

**simon**on D&D.Sci(-fi): Colonizing the SuperHyperSphere · 2024-01-17T09:29:51.578Z · LW · GW

Thanks for giving us this puzzle, abstractapplic.

My answer (possibly to be refined later, but I'll check other's responses and aphyer's posts after posting this):

id's: 96286,9344,107278,68204,905,23565,8415,83512,62718,42742,16423,94304

observations and approach used:

After some initial exploration I considered only a single combination of qualitative traits (No/Mint/Adequate/['Eerie Silence'], though I think it wouldn't have mattered if I chose something else) in order to study the quantitative variables without distractions.

Since Murphy's constant had the biggest effect, I first chose an approximation for the effect of Murphy's Constant (initially a parabola), then divided the ZPPG data by my prediction for Murphy's constant to get the effects of another variable (in this case, the local value of pi) to show up better. And so on, going back to refine my previously guessed functions as the noise from other variables cleared up.

As it turned out, this approach was unreasonably effective as the large majority of the variation (at least for the traits I ended up studying - see below) seems to be accounted for by multiplicative factors, each factor only taking into account one of the traits or variables.

Murphy's constant:

Cubic (I tried to get it to fit some kind of exponential, or even logistic function, because I had a headcanon explanation of something like that a higher value causes problems at a higher rate and the individual problems would multiply together before subtracting from nominal. (Or something.) But cubic fits better.) ~~It visually looks like it's inflecting near the extreme values of the data (not checked quantitatively) so maybe it's a (cubic) spline.~~

Local Value of Pi:

Piecewise linear, peaking around 3.15, same slope on either side I think. I tried to fit a sine to it first, similar reasons as with Murphy and exponentials.

Latitude:

Piecewise constant, lower value if between -36 and 36.

Longitude:

This one seems to be a sine, though not literally sin(x) - displaced vertically and horizontally. I briefly experimented to see if I could get a better fit substituting the local value of pi for our boring old conventional value, didn't seem to work, but maybe I implemented that wrong.

Shortitude:

Another piecewise constant. Lower value if greater than 45. Unlike latitude, this one is not symmetrical - it only penalizes in the positive direction.

Deltitude:

I found no effect.

Traits:

I only considered traits that seemed relatively promising from my initial exploration (really just what their max value was and how many tries they needed to get it): No or EXTREMELY, Mint, Burning or Copper, (any Feng Shui) and ['Eerie Silence'] or ['Otherworldly Skittering'].

All traits tested seemed to me to have a constant multiplier.

Values in my current predictor (may not have been tested on all the relevant data, and significant digits shown are not justified):

Extremely (relative to No): 0.94301

Burning, Copper (relative to Mint): 1.0429, 0.9224

Exceptional, Disharmonious (relative to Adequate): 1.0508,0.8403 - edit: I think these may actually be 1.05, 0.84 exactly.

Skittering (relative to Silience): 0.960248

Residual errors typically within 1%, relatively rarely above 1.5%. There could be other things I missed (e.g. non-multiplicative interactions) to account for the rest, or afaik it could be random. Since I haven't studied other traits than the ones listed, clues could also be lurking in those traits.

Using my overall predictor, my expected values for the 12 sites listed above are about:

96286: 112.3, 9344: 110.0, 107278: 109.3, 68204: 109.2, 905: 109.0, 23565: 108.1, 8415: 106.5, 83512: 106.0, 62718: 105.9 ,42742: 105.7, 16423: 105.4, 94304: 105.2

Given my error bars in the (part that I actually used of the) data set I'm pretty comfortable with this selection (in terms of building instead of folding, not necessarily that these are the best choices), though I should maybe check to see if any is right next to one of those cutoffs (latitude/shortitude) and I should also maybe be wary of extrapolating to very low values of Murphy's Constant. (e.g. 94304, 23565, 96286)

edited to add: aphyer's third post (which preceded this comment) has the same sort of conclusion and some similar approximations (though mine seem to be more precise), and unnamed also mentioned that it appears to be a bunch of things multiplied together. All of aphyer's posts have a lot of interesting general findings as well.

edited to also add: the second derivative of a cubic is a linear function. The cubic having zero second derivative at two different points is thus impossible unless the linear function is zero, which happens only when the first two coefficients of the cubic are zero (so the cubic is linear). So my mumbling about inflection points at **both** ends is complete nonsense... however, it does have close to zero second derivative near 0, so maybe it is a spline where we are seeing one end of it where the second derivative is set to 0 at that end. todo: see what happens if I actually set that to 0

edited again: see below comment - can actually set both linear and quadratic terms to 0

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-15T20:00:04.531Z · LW · GW

The trajectory is changing during the continuous burn, so the average direction of the continuous burn is between perpendicular to where the trajectory was at the start of the burn and where it was at the end. The instantaneous burn, by contrast, is assumed to be perpendicular to where the trajectory was at the start only. If you instead made it in between perpendicular to where it was at the start and where it was at the end, as in the continuous burn, you could make it also not add to the craft's speed.

Going back to the original discussion, yes this means that an instantaneous burn that doesn't change the speed is pointing slightly forward relative to where the rocket was going at the start of the burn, pushing the rocket slightly backward. But, this holds true even if you have a very tiny exhaust mass sent out at a very high velocity, where it obviously isn't going at the same speed as the rocket in the planet's reference frame.

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-13T06:03:58.162Z · LW · GW

...Are you just trying to point out that thrusting in opposite directions will cancel out?

No.

I'm pointing out that continuous thrust that's (continuously during the burn) perpendicular to the trajectory doesn't change the speed.

This also means that (going to your epsilon duration case) if the burn is small enough not to change the direction very much, the burn that doesn't change the speed will be close to perpendicular to the trajectory (and in the low mass change (high exhaust velocity) limit it will be close to halfway between the perpendiculars to the trajectory before and after the burn, even if it does change the direction a lot). That's independent of the exhaust velocity, as long as that velocity is high, and when it's high it will also tend not to match the ship's speed since it's much faster, which maybe calls into question your statement in the post, quoted above, which I'll requote:

One interesting questions is at what angle of thrust does the effect on the propellant go from negative to positive? I didn't do the math to check, but I'm pretty sure it's just the angle at which the speed of the propellant in the planet's reference frame is the exact same as the rocket's speed.

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-13T02:30:56.385Z · LW · GW

Yes, it's associative. But if you thrust at 90 degrees to the rocket's direction of motion, you aren't thrusting in a constant direction, but in a changing direction as the trajectory changes. This set of vectors in different directions will add up to a different combined vector than a single vector of the same total length pointing at 90 degrees to the direction of motion that the rocket had at the start of the thrusting.

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-12T21:11:30.473Z · LW · GW

In the limit where the retrograde thrust is infinitesimally small, it also does not increase the length of the main vector it is added to.

I implicitly meant, but again did not say explicitly, that the ratio of the contribution to the length of the vector from adding an infinitesimal sideways vector, as compared to the length of that infinitesimal vector, goes to zero of as the length of the sideways addition goes to zero (because it scales as the square of the sideways vector).

So adding a large number of tiny instantaneously sideways vectors, in the limit that the size of each goes to zero and holding to the total amount of thrust added constant, in that limit results in a non-zero change in direction but zero change in speed.

Whereas, if you add a large number of tiny instantaneous aligned vectors, the ratio of the contribution to the length of the vector to the length of each added tiny vector is 1, and if you add up a whole bunch of such additions, it changes the length and not the direction, regardless of how large or small each addition is.

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-12T19:18:16.126Z · LW · GW

The Oberth phenomenon is related but different I think

Yes, I think that if you (in addition to the speed thing) also take into account the potential energy of the exhaust, that accounts for the full Oberth effect.

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-12T18:52:20.292Z · LW · GW

In the limit where the perpendicular side vector is infinitesimally small, it does not increase the length of the main vector it is added to.

If you keep thrusting over time, as long as you keep the thrust continuously at 90 degrees as the direction changes, the speed will still not change. I implicitly meant, but did not explicit say, that the thrust is continuously perpendicular in this way. (Whereas, if you keep the direction of thrust fixed when the direction of motion changes so it's no longer at 90 degrees, or add a whole bunch of impulse at one time like shooting a bullet out at 90 degrees, then it will start to add speed).

**simon**on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-11T09:37:36.193Z · LW · GW

I'm not sure my perspective is significantly different than yours, but:

Using conservation of energy: imagine we have a given amount of mechanical (i.e. kinetic+potential) energy produced by expelling exhaust in the rocket's reference frame. The total mechanical energy change will be the same in any reference frame. But in another reference frame we have:

- the faster the rocket is going, the more kinetic energy the exhaust loses (or less it gains, depending on relative speeds) when it is dumped the other way, which means more energy for the rocket.
- the further down a gravity well you dump the exhaust, the less potential energy it has, which means more energy for the rocket.

Both are important from this perspective, but related since kinetic+potential energy is constant when not thrusting, so it's moving faster when it's down in the gravity well. Yeah, it also works with it using a gun or whatever instead of exhaust, but it's more intuitive IMO to imagine it with exhaust.

One interesting questions is at what angle of thrust does the effect on the propellant go from negative to positive? I didn't do the math to check, but I'm pretty sure it's just the angle at which the speed of the propellant in the planet's reference frame is the exact same as the rocket's speed.

I am not quite sure I understand the question, but when the thrust is at 90 degrees to the trajectory, the rocket's speed is unaffected by the thrusting, and it comes out of the gravity well at the same speed as it came in. That would apply equally if there were no gravity well.

**simon**on Saving the world sucks · 2024-01-10T15:49:52.565Z · LW · GW

I don’t want to tile the universe with hedonium.

Good! I don't want to be replaced with tiled hedonium either.

Perhaps some of the issue might be with perceptions of what is supposed to be "good" not matching your own values.

**simon**on Boltzmann brain's conditional probability · 2023-12-29T19:54:46.273Z · LW · GW

The bits of a (Boltzmann or not) brain's beliefs is limited by the bits of the brain itself. So I don't think this really works.

OTOH in my view it doesn't make sense to have a policy to believe you are a Boltzmann brain, even if such brains are numerous enough to make ones with your particular beliefs outweigh non-Boltzmann brains with your beliefs, because:

- such a policy will result in you incorrectly believing you are a Boltzmann brain if you aren't a Boltzmann brain, but if you are a Boltzmann brain you either:
- have such a policy by sheer coincidence, or
- adopted such a policy based on meta-level reasons that you have no reason to believe are reliable since they came from randomness, and

- even if a Boltzmann brain did obtain correct knowledge, this would not pay off in terms of practical benefits