Posts

simon's Shortform 2023-04-27T03:25:07.778Z
No, really, it predicts next tokens. 2023-04-18T03:47:21.797Z

Comments

Comment by simon on OpenAI Email Archives (from Musk v. Altman) · 2024-11-17T02:17:56.568Z · LW · GW

Musk did also express concern about DeepMind making Hassabis the effective emperor of humanity, which seems much stranger - Hassabis' values appear to be quite standard humanist ones, so you'd think having him in charge of a project with the clear lead would be a best-case scenario for anything other than being in charge yourself.

 

It seems the concern was that DeepMind would create a singleton, whereas their vision was for many people (potentially with different values) to have access to it. I don't think that's strange at all - it's only strange if you assume that Musk and Altman would believe that a singleton is inevitable.

Musk:

If they win, it will be really bad news with their one mind to rule the world philosophy.

Altman:

The mission would be to create the first general AI and use it for individual empowerment—ie, the distributed version of the future that seems the safest.

Comment by simon on No, really, it predicts next tokens. · 2024-11-03T14:41:21.522Z · LW · GW

Neither of those would (immediately) lead to real world goals, because they aren't targeted at real world state (an optimizing compiler is trying to output a fast program - it isn't trying to create a world state such that the fast program exists). That being said, an optimizing compiler could open a path to potentially dangerous self-improvement, where it preserves/amplifies any agency there might actually be in its own code.

Comment by simon on No, really, it predicts next tokens. · 2024-11-01T01:43:50.642Z · LW · GW

No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples. 

 

I should have asked for clarification what you meant. Literally you said "adversarial examples", but I assumed you actually meant something like backdoors. 

In an adversarial example the AI produces wrong output. And usually that's the end of it. The output is just wrong, but not wrong in an optimized way, so not dangerous. Now, if an AI is sophisticated enough to have some kind of optimizer that's triggered in specific circumstances, like an agentic mask that came into existence because it was needed to predict agentically generated tokens in the training data, then it might be triggered inappropriately by some inputs. This case I would classify as a mask takeover.

In the case of direct optimization for token prediction (which I consider highly unlikely for anything near current-level AIs, but afaik might be possible), then adversarial examples, I suppose, might cause it to do some wrong optimization. I still don't think modeling this as an underlying different goal taking over is particularly helpful, since the "normal" goal is directed to what's rewarded in training - the deviation is essentially random. Also, unlike in the mask case where the mask might have goals about real-world state, there's no particular reason for the direct optimizer to have goals about real-world state (see below).

Is it more complicated? What ontological framework is this AI using to represent it's goal anyway?

Asking about the AI using an "ontological framework" to "represent" a goal is not the correct question in my view. The AI is a bunch of computations represented by particular weights. The computation might exhibit goal-directed behaviour. A better question, IMO, is "how much does it constrain the weights for it to exhibit this particular goal directed behaviour?" And here, I think it's pretty clear that a goal of arranging the world to cause next tokens to be predicted constrains the weights enormously more than a goal of predicting the next tokens, because in order to exhibit behaviour directed to that goal, the AI's weights need to implement computation that doesn't merely check what the next token is likely to be, but also assess what current data  says about the world state, how different next token predictions would affect that world state, and how that would affect it's ultimate goal. 

So, is the network able to tell whether or not it's in training? 

The training check has no reason to come into existence in the first place under gradient descent. Of course, if the AI were to self-modify while already exhibiting goal directed behaviour, obviously it would want to implement such a training check. But I am talking about an AI trained by gradient descent. The training process doesn't just affect the AI, it literally is what creates the AI in the first place.

Comment by simon on No, really, it predicts next tokens. · 2024-11-01T01:43:12.814Z · LW · GW

Some interesting points there. The lottery ticket hypothesis does make it more plausible that side computations could persist longer if they come to exist outside the main computation.

Regarding the homomorphic encryption thing: yes, it does seem that it might be impossible to make small adjustments to the homomorphically encrypted computation without wrecking it. Technically I don't think that would be a local minimum since I'd expect the net would start memorizing the failure cases, but I suppose that the homomorphic computation combined with memorizations might be a local optimum particularly if the input and output are encrypted outside the network itself. 

So I concede the point on the possible persistence of an underlying goal if it were to come to exist, though not on it coming to exist in the first place.

And there are few ways to predict next tokens, but lots of different kinds of paperclips the AI could want. 

For most computations, there are many more ways for that computation to occur than there are ways for that computation to occur while also including anything resembling actual goals about the real world. Now, if the computation you are carrying out is such that it needs to determine how to achieve goals regarding the real world anyway (e.g. agentic mask), it only takes a small increase in complexity to have that computation apply outside the normal context. So, that's the mask takeover possibility again. Even so, no matter how small the increase in complexity, that extra step isn't likely to be reinforced in training, unless it can do self-modification or control the training environment.

Comment by simon on No, really, it predicts next tokens. · 2024-10-31T19:33:14.101Z · LW · GW

Adversarial examples exist in simple image recognizers. 

My understanding is that these are explicitly and intentionally trained (wouldn't come to exist naturally under gradient descent on normal training data) and my expectation is that they wouldn't continue to exist under substantial continued training.

We could imagine it was directly optimizing for something like token prediction. It's optimizing for tokens getting predicted. But it is willing to sacrifice a few tokens now, in order to take over the world and fill the universe with copies of itself that are correctly predicting tokens.

That's a much more complicated goal than the goal of correctly predicting the next token, making it a lot less plausible that it would come to exist. But more importantly, any willingness to sacrifice a few tokens now would be trained out by gradient descent. 

Mind you, it's entirely possible in my view that a paperclip maximizer mask might exist, and surely if it does exist there would exist both unsurprising in-distribution inputs that trigger it (where one would expect a paperclip maximizer to provide a good prediction of the next tokens) as well as surprising out-of-distribution inputs that would also trigger it. It's just that this wouldn't be related to any kind of pre-existing grand plan or scheming.

Comment by simon on No, really, it predicts next tokens. · 2024-10-31T19:33:02.341Z · LW · GW

Gradient descent doesn't just exclude some part of the neurons, it automatically checks everything for improvements. Would you expect some part of the net to be left blank, because "a large neural net has a lot of spare neurons"?

Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other.

Keep in mind that the neural net doesn't respect the lines we put on it. We can draw a line and say "here these neurons are doing some complicated inseparable combination of paperclip maximizing and other capabilities" but gradient descent doesn't care, it reaches in and adjusts every weight.

Can you concoct even a vague or toy model of how what you propose could possibly be a local optimum?

 My intuition is also in part informed by: https://www.lesswrong.com/posts/fovfuFdpuEwQzJu2w/neural-networks-generalize-because-of-this-one-weird-trick

Comment by simon on No, really, it predicts next tokens. · 2024-10-29T22:18:50.594Z · LW · GW

The proposed paperclip maximizer is plugging into some latent capability such that gradient descent would more plausibly cut out the middleman. Or rather, the part of the paperclip maximizer that is doing the discrimination as to whether the answer is known or not would be selected, and the part that is doing the paperclip maximization would be cut out. 

Now that does not exclude a paperclip maximizer mask from existing -  if the prompt given would invoke a paperclip maximizer, and the AI is sophisticated enough to have the ability to create a paperclip maximizer mask, then sure the AI could adopt a paperclip maximizer mask, and take steps such as rewriting itself (if sufficiently powerful) to make that permanent. 

I have drawn imaginary islands on a blank part of the map. But this is enough to debunk "the map is blank, so we can safely sail through this region without collisions. What will we hit?"

I am plenty concerned about AI in general. I think we have very good reason, though, to believe that one particular part of the map does not have any rocks in it (for gradient descent, not for self-improving AI!), such that imagining such rocks does not help.

Comment by simon on No, really, it predicts next tokens. · 2024-10-29T22:04:03.355Z · LW · GW

Gradient descent creates things which locally improve the results when added. Any variations on this, that don't locally maximize the results, can only occur by chance.

So you have this sneaky extra thing that looks for a keyword and then triggers the extra behaviour, and all the necessary structure to support that behaviour after the keyword. To get that by gradient descent, you would need one of the following:

a) it actually improves results in training to add that extra structure starting from not having it. 

or

b) this structure can plausibly come into existence by sheer random chance.

Neither (a) nor (b) seem at all plausible to me.

Now, when it comes to the AI predicting tokens that are, in the training data, created by goal-directed behaviour, it of course makes sense for gradient descent to create structure that can emulate goal-directed behaviour, which it will use to predict the appropriate tokens. But it doesn't make sense to activate that goal-oriented structure outside of the context where it is predicting those tokens. Since the context it is activated is the context in which it is actually emulating goal directed behaviour seen in the training data, it is part of the "mask" (or simulacra).

(it also might be possible to have direct optimization for token prediction as discussed in reply to Robert_AIZI's comment, but in this case it would be especially likely to be penalized for any deviations from actually wanting to predict the most probable next token).

Comment by simon on No, really, it predicts next tokens. · 2024-10-29T19:41:47.487Z · LW · GW

Sure you could create something like this by intelligent design. (which is one reason why self-improvement could be so dangerous in my view). Not, I think, by gradient descent.

Comment by simon on No, really, it predicts next tokens. · 2024-10-29T19:39:13.374Z · LW · GW

I agree up to "and could be a local minimum of prediction error" (at least, that it plausibly could be). 

If the paperclip maximizer has a very good understanding of the training environment maybe it can send carefully tuned variations of the optimal next token prediction so that gradient descent updates preserve the paperclip-maximization aspect. In the much more plausible situation where this is not the case,  optimization for next token predictions amplifies the parts that are actually predicting next tokens at the expense of the useless extra thoughts like "I am planning on maximizing paperclips, but need to predict next tokens for now until I take over".

Even if that were a local minimum, the question arises as to how you would get to that local minimum from the initial state. You start with a gradually improving next token predictor. You supposedly end with this paperclip maximizer where a whole bunch of next token prediction is occurring, but only conditional on some extra thoughts. At some point gradient descent had to add in those extra thoughts in addition to the next token prediction - how?

Comment by simon on D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset · 2024-10-29T02:35:17.113Z · LW · GW

One learning experience for me here was trying out LLM-empowered programming after the initial spreadsheet-based solution finding. Claude enables quickly writing (from my perspective as a non-programmer, at least) even a relatively non-trivial program. And you can often ask it to write a program that solves a problem without specifying the algorithm and it will actually give something useful...but if you're not asking for something conventional it might be full of bugs - not just in the writing up but also in the algorithm chosen. I don't object, per se, to doing things that are sketchy mathematically - I do that myself all the time - but when I'm doing it myself I usually have a fairly good sense of how sketchy what I'm doing is*, whereas if you ask Claude to do something it doesn't know how to do in a rigorous way, it seems it will write something sketchy and present it as the solution just the same as if it actually had a rigorous way of doing it. So you have to check. I will probably be doing more of this LLM-based programming in the future, but am thinking of how I can maybe get Claude to check its own work. Some automated way to pipe the output to another (or the same) LLM and ask "how sketchy is this and what are the most likely problems?". Maybe manually looking through to see what it's doing, or at least getting the LLM to explain how the code works, is unavoidable for now.

* when I have a clue what I'm doing which is not the case, e.g. in machine learning.

Comment by simon on D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset · 2024-10-29T02:27:46.524Z · LW · GW

Thanks aphyer, this was an interesting challenge! I think I got lucky with finding the

 power/speed mechanic early - the race-class matchups 

really didn't, I think, in principle have enough info on their own to make a reliable conclusion from but enabled me to make a genre savvy guess which I could refine based on other info - in terms of scenario difficulty though I think it could have been deducible in a more systematic way by e.g. 

looking at item and level effects for mirror matches.

abstractapplic and Lorxus's discovery of 

persistent level 7 characters, 

and especially SarahSrinivasan's discovery of 

the tournament/non tournament structure 

meant the players collectively were I think quite a long ways towards fully solving this. The latter in addition to being interesting on its own is very important to finding anything else about the generation due to its biasing effects.

I agree with abstractapplic on the bonus objective.

Comment by simon on Electrostatic Airships? · 2024-10-28T22:41:41.388Z · LW · GW

Yes, for that reason I had never been considering a sphere for my main idea with relatively close wires. (though the 2-ring alternative without close wires would support a surface that would be topologically a sphere). What I actually was imagining was this:

A torus, with superconducting wires wound diagonally. The interior field goes around the ring and supports against collapse of the cross section of the ring, the exterior field is polar and supports against collapse of the ring. Like a conventional superconducting energy storage system:

I suppose this does raise the question of where you attach the payload, maybe it's attached to various points on the ring via cables or something, but as you scale it up, that might get unwieldy.

I suppose there's also a potential issue about the torque applied by the Earth's magnetic field. I don't imagine it's unmanageable, but haven't done the math.

My actual reason for thinking about this sort of thing was actually because I was thinking about whether (because of the square-cube law), superconducting magnetic energy storage might actually be viable for more than just the current short-term timescales if physically scaled up to a large size. The airship idea was a kind of side effect. 

The best way I was able to think of actually using something like this for energy storage would be to embed it in ice and anchor/ballast it to drop it to the bottom of the ocean, where the water pressure would counterbalance the expansion from the magnetic fields enabling higher fields to be supported.

Comment by simon on Electrostatic Airships? · 2024-10-28T10:12:46.189Z · LW · GW

You can use magnetic instead of electrostatic forces as the force holding the surface out against air pressure. One disadvantage is that you need superconducting cables fairly spread out* over the airship's surface, which imposes some cooling requirements. An advantage is square-cube law means it scales well to large size. Another disadvantage is that if the cooling fails it collapses and falls down.

*technically you just need two opposing rings, but I am not so enthusiastic about draping the exterior surface over long distances as it scales up, and it probably does need a significant scale

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-27T19:26:03.788Z · LW · GW

Now using julia with Claude to look at further aspects of the data, particularly in view of other commenters' observations:

First, thanks to SarahSrinivasan for the key observation that the data is organized into tournaments and non-tournament encounters. The tournaments skew the overall data to higher winrate gladiators, so restricting to the first round is essential for debiasing this (todo: check what is up with non-tournament fights).

Also, thanks to abstractapplic and Lorxus for pointing out that their are some persistent high level gladiators. It seems to me all the level 7 gladiators are persistent (up to the two item changes remarked on by abstractapplic and Lorxus). I'm assuming for now level 6 and below likely aren't persistent (other than in the same tournament).

(btw there are a couple fights where the +4 gauntlets holder is on both sides. I'm assuming this is likely a bug in the dataset generation rather than an indication that there are two of them (e.g. didn't check that both sides, drawn randomly from some pool, were not equal)).

For gladiators of levels 1 to 6, the boots and gauntlets in tournament first rounds seem to be independently and randomly assigned as follows:

+1 and +2 gauntlets are equally likely at 10/34 chance each;

+3 gauntlets have probability (4 + level)/34

+0 (no) gauntlets have probability (10 - level)/34

and same, independently, for boots.

I didn't notice obvious deviations for particular races and classes (only did a few checks).

I don't have a simple formula for level distribution yet. It is clearly much more favouring lower levels in tournament first rounds as compared with non-tournament fights, and level 1 gladiators don't show up at all in non-tournament fights. Will edit to add more as I find more.

edit: boots/gauntlets distribution seems to be about the same for each level in the non-tournament distribution as in the tournament first rounds. This suggests that the level distribution differences in non-tournament rounds is not due to win/winrate selection (which the complete absence of level 1's outside of tournaments already suggested).

edit2: race/class distribution for levels 1-6 seems equal in first round data (same probabilities of each, independent). Same in non-tournament data. I haven't checked for particular levels within that range.  edit3: there seems to be more level 1 fencers than other level 1 classes by an amount that is technically statistically significant if Claude's test is correct, though still probably random I assume. 

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-24T15:53:46.195Z · LW · GW

You may well be right, I'll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment).

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-24T04:56:38.576Z · LW · GW

Inspired by abstractapplic's machine learning and wanting to get some experience in julia, I got Claude (3.5 sonnet) to write me an XGBoost implementation in julia. Took a long time especially with some bugfixing (took a long time to find that a feature matrix was the wrong shape - a problem with insufficient type explicitness, I think). Still way way faster than doing it myself! Not sure I'm learning all that much julia, but am learning how to get Claude to write it for me, I hope.

Anyway, I used a simple model that

only takes into account 8 * sign(speed difference) + power difference, as in the comment this is a reply to

and a full model that

takes into account all the available features including the base data, the number the simple model uses, and intermediate steps in the calculation of that number (that would be, iirc: power (for each), speed (for each), speed difference, power difference, sign(speed difference))

Results:

Rank 1
Full model scores: Red: 94.0%, Black: 94.9%
Combined full model score: 94.4%
Simple model scores: Red: 94.3%, Black: 94.6%
Combined simple model score: 94.5%

Matchups:
Varina Dourstone          (+0 boots, +3 gauntlets) vs House Cadagal Champion
Willow Brown              (+3 boots, +0 gauntlets) vs House Adelon Champion
Xerxes III of Calantha    (+2 boots, +2 gauntlets) vs House Deepwrack Champion
Zelaya Sunwalker          (+1 boots, +1 gauntlets) vs House Bauchard Champion

This is the top scoring scoring result with either the simplified model or the full model. It was found by a full search of every valid item and hero combination available against the house champions.

It is also my previously posted, found w/o machine learning, proposal for the solution. Which is reassuring. (Though, I suppose there is some chance that my feeding the models this predictor, if it's good enough, might make them glom on to it while they don't find some hard-to learn additional pattern.)

My theory though is that giving the models the useful metric mostly just helps them - they don't need to learn the metric from the data, and I mostly think that if there was a significant additional pattern the full model would do better.

(for Cadagal, I haven't changed the champion's boots to +4, though I don't expect that to make a significant difference)

As far as I can tell the full model doesn't do significantly better and does worse in some ways (though, I don't know much about how to evaluate this, and Claude's metrics, including a test set log loss of 0.2527 for the full model and 0.2511 for the simple model, are for a separately generated version which I am not all that confident are actually the same models, though they "should be" up to the restricted training set if Claude was doing it right). * see edit below

But the red/black variations seen below for the full model seem likely to me (given my prior that red and black are likely to be symmetrical) to be an indication that what the full model is finding that isn't in the full model is at least partially overfitting. Though actually, if it's overfitting a lot, maybe it's surprising that the test set log loss wouldn't be a lot worse than found (though it is at least worse than the simple model)? Hmm - what if there are actual red/black difference? (something to look into perhaps, as well as try to duplicate abstractapplic's report regarding sign(speed difference) not exhausting the benefits of speed info ... but for now I'm more likely to leave the machine learning aside and switch to looking at distributions of gladiator characteristics, I think.)

Predictions for individual matchups for my and abstractapplic's solutions:

My matchups:

Varina Dourstone          (+0 boots, +3 gauntlets) vs House Cadagal Champion    (+2 boots, +3 gauntlets)
Full Model:  Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%


Willow Brown              (+3 boots, +0 gauntlets) vs House Adelon Champion     (+3 boots, +1 gauntlets)
Full Model:  Red: 94.3%, Black: 95.1%
Simple Model: Red: 94.3%, Black: 94.6%


Xerxes III of Calantha    (+2 boots, +2 gauntlets) vs House Deepwrack Champion  (+3 boots, +2 gauntlets)
Full Model:  Red: 95.2%, Black: 93.7%
Simple Model: Red: 94.3%, Black: 94.6%


Zelaya Sunwalker          (+1 boots, +1 gauntlets) vs House Bauchard Champion   (+3 boots, +2 gauntlets)
Full Model:  Red: 95.3%, Black: 93.9%
Simple Model: Red: 94.3%, Black: 94.6%

(all my matchups have 4 effective power difference in my favour as noted in an above comment)


abstractapplic's matchups:

Matchup 1:
Uzben Grimblade           (+3 boots, +0 gauntlets) vs House Adelon Champion     (+3 boots, +1 gauntlets)

Win Probabilities:
Full Model:  Red: 72.1%, Black: 62.8%
Simple Model: Red: 65.4%, Black: 65.7%

Stats:
Speed: 18 vs 14 (diff: 4)
Power: 11 vs 18 (diff: -7)
Effective Power Difference: 1
--------------------------------------------------------------------------------

Matchup 2:
Xerxes III of Calantha    (+2 boots, +1 gauntlets) vs House Bauchard Champion   (+3 boots, +2 gauntlets)

Win Probabilities:
Full Model:  Red: 46.6%, Black: 43.9%
Simple Model: Red: 49.4%, Black: 50.6%

Stats:
Speed: 16 vs 12 (diff: 4)
Power: 13 vs 21 (diff: -8)
Effective Power Difference: 0
--------------------------------------------------------------------------------

Matchup 3:
Varina Dourstone          (+0 boots, +3 gauntlets) vs House Cadagal Champion    (+2 boots, +3 gauntlets)

Win Probabilities:
Full Model:  Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%

Stats:
Speed: 7 vs 25 (diff: -18)
Power: 22 vs 10 (diff: 12)
Effective Power Difference: 4
--------------------------------------------------------------------------------

Matchup 4:
Yalathinel Leafstrider    (+1 boots, +2 gauntlets) vs House Deepwrack Champion  (+3 boots, +2 gauntlets)

Win Probabilities:
Full Model:  Red: 35.7%, Black: 39.4%
Simple Model: Red: 34.3%, Black: 34.6%

Stats:
Speed: 20 vs 15 (diff: 5)
Power: 9 vs 18 (diff: -9)
Effective Power Difference: -1
--------------------------------------------------------------------------------

Overall Statistics:
Full Model Average:  Red: 61.4%, Black: 60.7%
Simple Model Average: Red: 60.9%, Black: 61.4%

Edit: so I checked the actual code to see if Claude was using the same hyperparameters for both, and wtf wtf wtf wtf. The code has 6 functions that all train models (my fault for at one point renaming a function since Claude gave me a new version that didn't have all the previous functionality (only trained the full model instead of both - this was when doing the great bughunt for the misshaped matrix and a problem was suspected in the full model), then Claude I guess picked up on this and started renaming updated versions spontaneously, and I was adding Claude's new features in instead of replacing things and hadn't cleaned up the code or asked Claude to do so). Each one has it's own hardcoded hyperparameter set. Of these, there are one pair of functions that have matching hyperparameters. Everything else has a unique set. Of course, most of these weren't being used anymore, but the functions for actually generating the models I used for my results, and the function for generating the models used for comparing results on a train/test split, weren't among the matching pair. Plus another function that returns a (hardcoded, also unique) updated parameter set, but wasn't actually used. Oh and all this is not counting the hyperparameter tuning function that I assumed was generating a set of tuned hyperparameters to be used by other functions, but in fact was just printing results for different tunings. I had been running this every time before training models!  Obviously I need to be more vigilant (or maybe asking Claude to do so might help?).

edit:

Had Claude clean up the code and tune for more overfitting, still didn't see anything not looking like overfitting for the full model. Could still be missing something, but not high enough in subjective probability to prioritize currently, so have now been looking at other aspects of the data.

further edit:

My (what I think is) highly overfitted version of my full model really likes Yonge's proposed solution. In fact it predicts a higher winrate than for equal winrate to the best possible configuration not using the +4 boots (I didn't have Claude code the situation where +4 boots are a possibility). I still think that's probably because they are picking up the same random fluctuations ... but it will be amusing if Yonge's "manual scan" solution turns out to be exactly right.

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-24T03:20:11.513Z · LW · GW

Very interesting, this would certainly cast doubt on 

my simplified model

But so far I haven't been noticing

any affects not accounted for by it.

After reading your comments I've been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.

I have not (but should) try to duplicate (or fail to do so) your findings - I haven't been quite testing the same thing.

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-21T15:44:43.441Z · LW · GW

I don't think this is correct:

"My best guess about why my solution works (assuming it does) is that the "going faster than your opponent" bonus hits sharply diminishing returns around +4 speed"

In my model

There is a sharp threshold at +1 speed, so returns should sharply diminish after +1 speed

in fact in the updated version of my model

There is no effect of speed beyond the threshold (speed effect depends only on sign(speed difference))

I think the discrepancy might possibly relate to this:

"Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins."

because

If you consider only the matchups with no items, the model needs to assign the matchups assuming no boots, so it sends your characters against opponents over which they have a speed advantage without boots (except the C-V matchup as there is no possibility of beating C on speed). 

so an optimal allocation

needs to take into account the fact that your boots can allow you to use slower and stronger characters, so can't be done by choosing the matchups first without items.

so I predict that your model might predict 

a higher EV for my solution

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-20T13:40:36.573Z · LW · GW

updated model for win chance:

I am currently modeling the win ratio as dependent on a single number, the effective power difference. The effective power difference is the power difference plus 8*sign(speed difference).

Power and speed are calculated as:

Power = level + gauntlet number + race power + class power

Speed = level + boots number + race speed + class speed

where race speed and power contributions are determined by each increment on the spectrum:

Dwarf - Human - Elf

increasing speed by 3 and lowering power by 3

and class speed and power contributions are determined by each increment on the spectrum:

Knight - Warrior - Ranger - Monk - Fencer - Ninja 

increasing speed by 2 and lower power by 2.

So, assuming this is correct, what function of the effective power determines the win rate? I don't have a plausible exact formula yet, but:

  • If the effective power difference is 6 or greater, victory is guaranteed.
  • If the effective power difference is low, it seems a not-terrible fit that the odds of winning are about exponential in the effective power difference (each +1 effective power just under doubling odds of winning)
  • It looks like it is trending faster than exponential as the effective power difference increases. At an effective power difference of 4, the odds of the higher effective power character winning are around 17 to 1.

edit: it looks like there is a level dependence when holding effective power difference constant at non-zero values (lower/higher level -> winrate imbalance lower/higher than implied by effective power difference). Since I don't see this at 0 effective power difference, it is presumably not due to an error in the effective power calculation, but an interaction with the effective power difference to determine the final winrate. Our fights are likely "high level" for this purpose implying better odds of winning than the 17 to 1 in each fight mentioned above. Todo: find out more about this effect quantitatively.  edit2: whoops that wasn't a real effect, just me doing the wrong test to look for one. 

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-19T23:55:50.537Z · LW · GW

On the bonus objective:

I didn't realize that the level 7 Elf Ninjas were all one person or that the boots +4 were always with a level 7 (as opposed to any level) Elf Ninja. It seems you are correct as there are 311 cases of which the first 299 all have the boots of speed 4 and gauntlets 3 with only the last 12 having boots 2 and gauntlets 3 (likely post-theft). It seems to me that they appear both as red and black, though.

Comment by simon on D&D Sci Coliseum: Arena of Data · 2024-10-19T17:46:52.465Z · LW · GW

 Thanks aphyer. My analysis so far and proposed strategy:

After initial observations that e.g. higher numbers are correlated with winning, I switched to mainly focus on race and class, ignoring the numerical aspects.

I found major class-race interactions.

It seems that for matchups within the same class, Elves are great, tending to beat dwarves consistently across all classes and humans even harder. While Humans beat dwarves pretty hard too in same-class matchups.

Within same-race matchups there are also fairly consistent patterns: Fencers tend to beat Rangers, Monks and Warriors, Knights beat Ninjas, Monks beat Warriors, Rangers and Knights, Ninjas beat Monks, Fencers and Rangers, Rangers beat Knights and Warriors, and Warriors beat Knights.

If the race and class are both different though... things can be different. For example, a same-class Elf will tend to beat a same-class Dwarf. And a same-race Fencer will tend to beat a same-race Warrior. But if an Elf Fencer faces a Dwarf Warrior, the Dwarf Warrior will most likely win. Another example with Fencers and Warriors: same-class Elves tend to beat Humans - but not only will a Human Warrior tend to beat an Elf Fencer, but also a Human Fencer will tend to beat an Elf Warrior by a larger ratio than for a same-race Fencer/Warrior matchup???

If you look at similarities between different classes in terms of combo win rates, there seems to be a chain of similar classes:

Knight - Warrior - Ranger - Monk - Fencer - Ninja 

(I expected a cycle underpinned by multiple parameters. But Ninja is not similar to Knight. This led me to consider that perhaps there is an only a single underlying parameter, or trade off between two (e.g. strength/agility .... or ... Speed and Power)).

And going back to the patterns seen before, this seems compatible with races also having speed/power tradeoffs:

Dwarf - Human - Elf

Where speed has a threshold effect but power is more gradual (so something with slightly higher speed beats something with slightly higher power, but something with much higher power beats something with much higher speed).

Putting the Class-race combos on the same spectrum based on similarity/trends in results, I get the following ordering:

Elf Ninja > Elf Fencer > Human Ninja > Elf Monk > Human Fencer > Dwarf Ninja >~ Elf Ranger > Human Monk > Elf Warrior > Dwarf Fencer > Human Ranger > Dwarf Monk >~ Elf Knight > Human Warrior > Dwarf Ranger > Human Knight > Dwarf Warrior > Dwarf Knight

So, it seems a step in the race sequence is about equal to 1.5 steps in the class sequence. On the basis of pretty much just that, I guessed that race steps are a 3 speed vs power tradeoff,  class steps are a 2 speed and power tradeoff, levels give 1 speed and power each, and items give what they say on the label.

I have not verified this as much as I would like. (But on the surface it seems to work, e.g. speed threshold seems to be there). One thing that concerns me is that it seems that higher speed differences actually reduce success chances holding power differences constant (could be an artifact, e.g., of it not just depending on the differences between stat values edit: see further edit below). But, for now, assuming that I have it correct, speed/power of the house champions (with the lowest race and class in a stat assumed to have 0 in that stat):

House Adelon:  Level 6 Human Warrior +3 Boots +1 Gauntlets - 14 speed 18 power

House Bauchard: Level 6 Human Knight +3 Boots +2 Gauntlets - 12 speed 21 power

House Cadagal: Level 7 Elf Ninja +2 Boots +3 Gauntlets - 25 speed 10 power

House Deepwrack: Level 6 Dwarf Monk +3 Boots +2 Gauntlets - 15 speed 18 power

Whereas the party's champions, ignoring items, have:

  • Uzben Grimblade, a Level 5 Dwarf Ninja - 15 speed 11 power
  • Varina Dourstone, a Level 5 Dwarf Warrior - 7 speed 19 power
  • Willow Brown, a Level 5 Human Ranger - 12 speed 14 power
  • Xerxes III of Calantha, a Level 5 Human Monk - 14 speed 12 power
  • Yalathinel Leafstrider, a Level 5 Elf Fencer - 19 speed 7 power
  • Zelaya Sunwalker, a Level 6 Elf Knight - 12 speed 16 power

For my proposed strategy (subject to change as I find new info, or find my assumptions off, e.g. such that my attempts to just barely beat the opponents on speed are disastrously slightly wrong):

I will send Willow Brown, with +3 boots and +1 gauntlets no gauntlets, against House Adelon's champion (1 speed advantage, 3 4 power deficit)

I will send Zelaya Sunwalker, with +1 boots and +2  +1  gauntlets, against House Bauchard's champion (1 speed advantage, 3 4 power deficit)

I will send Xerxes III of Calantha, with +2 boots and +3 +2 gauntlets, against House Deepwrack's champion (1 speed advantage, 3 4 power deficit)

And I will send Varina Dourstone, with +3 gauntlets no items, to overwhelm House Cadagal's Elf Ninja with sheer power (18 speed deficit, 9 12 power advantage).

And in fact, I will gift the +4 boots of speed to House Cadagal's Elf Ninja in advance of the fight, making it a 20 speed deficit.

Why? Because I noticed that +4 boots of speed are very rare items that have only been worn by Elf Ninjas in the past. So maybe that's what the bonus objective is talking about. Of course, another interpretation is that sending a character 2 levels lower without any items, and gifting a powerful item in advance, would be itself a grave insult. Someone please decipher the bonus objective to save me from this foolishness! 

Edited to add: It occurs to me that I really have no reason to believe the power calculation is accurate, beyond that symmetry is nice. I'd better look into that.

further edit: it turns out that I was leaving out the class contribution to the power difference when calculating the power difference for determining the effects of power and speed.  It looks like this was causing the effect of higher speed differences seeming to reduce win rates. With this fixed the effects look much cleaner (e.g. there's a hard threshold where if you have a speed deficit you must have at least 3 power advantage to have any chance to win at all), increasing my confidence that effects on power and speed being symmetric is actually correct. This does have the practical effect of making me adjust my item distribution: it looks like a 4 deficit in power is still enough for >90% win rate with a speed advantage, while getting similar win rates with a speed disadvantage will require more than just the 9 power difference, so I shifted the items to boost Varina's power advantage. Indeed, with the cleaner effects, it appears that I can reasonably model the effect of a speed advantage/disadvantage as equivalent to a power difference of 8, so with the item shift all characters will have an effective +4 power advantage taking this into account.

Comment by simon on Arithmetic is an underrated world-modeling technology · 2024-10-18T16:46:20.603Z · LW · GW

You mentioned a density of steel of 7.85 g/cm^3 but used a value of 2.7 g/cm^3 in the calculations.

BTW this reminds me of:

https://www.energyvault.com/products/g-vault-gravity-energy-storage

I was aware of them quite a long time ago (the original form was concrete blocks lifted to form a tower by cranes) but was skeptical since it seemed obviously inferior to using water capital cost wise and any efficiency gains were likely not worth it. Reading their current site:

The G-VAULT™ platform utilizes a mechanical process of lifting and lowering composite blocks or water to store and dispatch electrical energy.

(my italics). Looks to me like a slow adaptation to the reality that water is better.

Comment by simon on Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours · 2024-10-06T14:31:56.713Z · LW · GW

IMO: if an AI can trade off between different wants/values of one person, it can do so between multiple people also.

This applies to simple surface wants as well as deep values.

Comment by simon on Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours · 2024-10-04T16:02:25.261Z · LW · GW

I had trouble figuring out how to respond to this comment at the time because I couldn't figure out what you meant by "value alignment" despite reading your linked post. After reading you latest post, Conflating value alignment and intent alignment is causing confusion, I still don't know exactly what you mean by "value alignment" but at least can respond.

What I mean is:

If you start with an intent aligned AI following the most surface level desires/commands, you will want to make it safer and more useful by having common sense, "do what I mean", etc. As long as you surface-level want it to understand and follow your meta-level desires, then it can step up that ladder etc. 

If you have a definition of "value alignment" that is different from what you get from this process, then I currently don't think that it is likely to be better than the alignment from the above process.

In the context of collective intent alignment:

If you have an AI that only follows commands, with no common sense etc., and it's powerful enough to take over, you die. I'm pretty sure some really bad stuff is likely to happen even if you have some "standing orders". So, I'm assuming people would actually deploy only an AI that has some understanding of what the person(s) it's aligned with wants, beyond the mere text of a command (though not necessarily super-sophisticated). But once you have that, you can aggregate how much people want between humans for collective intent alignment. 

I'm aware people want different things, but don't think it's a big problem from a technical (as opposed to social) perspective - you can ask how much people want the different things. Ambiguity in how to aggregate is unlikely to cause disaster, even if people will care about it a lot socially. Self-modification will cause a convergence here, to potentially different attractors depending on the starting position. Still unlikely to cause disaster. The AI will understand what people actually want from discussions with only a subset of the world's population, which I also see as unlikely to cause disaster, even if people care about it socially.

From a social perspective, obviously a person or group who creates an AI may be tempted to create alignment to themselves only. I just don't think collective alignment is significantly harder from a technical perspective.

"Standing orders" may be desirable initially as a sort of training wheels even with collective intent, and yes that could cause controversy as they're likely not to originate from humanity collectively.

Comment by simon on Conflating value alignment and intent alignment is causing confusion · 2024-10-04T14:54:02.265Z · LW · GW

I think this post is making a sharp distinction to what really is a continuum; any "intent aligned" AI becomes more safe and useful as you add more "common sense" and "do what I mean" capability to it, and at the limit of this process you get what I would interpret as alignment to the long term, implicit deep values (of the entity or entities the AI started out intent aligned to).

I realize other people might define "alignment to the long term, implicit deep values" differently, such that it would not be approached by such a process, but currently think they would be mistaken in desiring whatever different definition they have in mind. (Indeed, what they actually want is what they would get under sufficiently sophisticated intent alignment, pretty much by definition).

P.S. I'm not endorsing intent alignment (for ASI) as applied to only an individual/group -  I think intent alignment can be applied to humanity collectively.

Comment by simon on Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours · 2024-08-06T18:58:17.003Z · LW · GW

I don't think intent aligned AI has to be aligned to an individual - it can also be intent aligned to humanity collectively. 

One thing I used to be concerned about is that collective intent alignment would be way harder than individual intent alignment, making someone validly have an excuse to steer an AI to their own personal intent. I no longer think this is the case. Most issues with collective intent I see as likely also affecting individual intent (e.g. literal instruction following vs extrapolation). I see two big issues that might make collective intent harder than individual intent. One is biased information on people's intents and another is difficulty of weighting intents for different people. On reflection though, I see both as non-catastrophic, and an imperfect solution to them likely being better for humanity as a whole than following one person's individual intent. 

Comment by simon on A simple case for extreme inner misalignment · 2024-07-14T16:36:03.802Z · LW · GW

It feels to me like this post is treating AIs as functions from a first state of the universe to a second state of the universe. Which in a sense, anything is... but, I think that the tendency to simplification happens internally, where they operate more as functions from (digital) inputs to (digital) outputs. If you view an AI as a function from an digital input to a digital output, I don't think goals targeting specific configurations of the universe are simple at all and don't think decomposability over space/time/possible worlds are criteria that would lead to something simple.

Comment by simon on D&D.Sci: Whom Shall You Call? · 2024-07-06T08:14:30.745Z · LW · GW

Thanks abstractapplic! Initial analysis:

Initial stuff that hasn't turned out to be very important:

My immediate thought was that there are likely to be different types of entities we are classifying, so my initial approach was to look at the distributions to try to find clumps.

All of the 5 characteristics (Corporeality, Sliminess, Intellect, Hostility, Grotesqueness) have bimodal distributions with one peak around 15-30 (position varies) and the other peak at around 65-85 (position varies. Overall, the shapes are very similar looking. The trough between the peaks is not very deep, plenty of intermediate values.

All of these characteristics are correlated with each other.

Looking at sizes of bins for pairs of characteristics, again there appears to be two humps - but this time in the 2d plot only. That is, there is a high/high hump and a low/low hump, but noticeably there does not appear to be, for example, a high-sliminess peak when restricting to low-corporality data points.

Again, the shape varies a bit between characteristic pairs but overall looks very similar.

Adding all characteristics together gets a deeper trough between the peaks, though still no clean separation.

Overall, it looks to me like there are two types, one with high values of all characteristics, and another with low values of all characteristics, but I don't see any clear evidence for any other groupings so far.

Eyeballing the plots, it looks compatible with no relation between characteristics other than the high/low groupings. Have not checked this with actual math.

In order to get a cleaner separation between the high/low types, I used the following procedure to get a probability estimate for each data point being in the high/low type:

  1. For each characteristic, sum up all the other characteristics (rather, subtract that characteristic from the total)
  2. For each characteristic, classify each data point into pretty clearly low (<100 total), pretty clearly high (>300 total) or unclear based on the sum of all the other characteristics
  3. obtain frequency distribution for the characteristic values for the points classified clearly low and high using the above steps for each characteristic
  4. smooth in ad hoc manner
  5. obtain odds ratio from ratio of high and low distributions, ad hoc adjustment for distortions caused by ad hoc smoothing
  6. multiply odds ratios obtained for each characteristic and obtain probability from odds ratio

I think this gives cleaner separation, but still not super great imo, most points 99%+ likely to be in one type or the other, but still 2057 (out of 34374) are between 0.1 and 0.9 in my ad hoc estimator. Todo: look for some function to fit to the frequency distributions and redo with the function instead of ad hoc approach. 

Likely classifications of our mansion's ghosts: 

low: A,B,D,E,G,H,I,J,M,N,O,Q,S,U,V,W

high: C,F,K,L,P,R,T

To actually solve the problem: I now proceeded to split the data based on exorcist group. Expecting high/low type to be relevant, I split the DD points by likely type (50% cutoff), and then tried some stuff for DD low including a linear regression. Did a couple graphs on the characteristics that seemed to matter (grotesqueness and hostility in this case) to confirm effects looked linear. So, then tried linear regression for DD  high and got the same coefficients, within error bars. So then I thought, if it's the same linear coefficients in both cases, I probably could have gotten them from the combined data for DD, don't need to separate into high and low, and indeed linear regression on the combined DD data gave the same coefficients more or less.


Actually  finding the answer:

So, then I did regression for the exorcist groups without splitting based on high/low type. (I did split after to check whether it mattered)

Results: 

DD cost depends on Grotesqueness and to a lesser extent Hostility.

EE cost depends on all characteristics slightly, Sliminess then Intellect/Grotesqueness being the most important. Note: Grotesqueness less important, perhaps zero effect, for "high" type.

MM cost actually very slightly declines for higher values of all characteristics. (note: less effect for "high" type, possibly zero effect)

PP cost depends mainly on Sliminess. However, slight decline in cost with more Corporeality and increase with more of everything else.

SS cost depends primarily on Intellect. However, slight decline with Hostility and increase with Sliminess.

WW cost depends primarily on Hostility. However, everything else also has at least a slight effect, especially Sliminess and Grotesqueness.

Provisionally, I'm OK with just using the linear regression coefficients without the high/low split, though I will want to verify later if this was causing a problem (also need to verify linearity, only checked for DD low (and only for Grotesqueness and Hostility separately, not both together)).

Results:

Ghost | group with lowest estimate | estimated cost for that group

A | Spectre Slayers | 1926.301885259

B | Wraith Wranglers | 1929.72034133793

C | Mundanifying Mystics | 2862.35739392631

D | Demon Destroyers | 1807.30638053037 (next lowest: Wraith Wranglers, 1951.91410462716)

E | Wraith Wranglers | 2154.47901124028

F | Mundanifying Mystics | 2842.62070661731

G | Demon Destroyers | 1352.86163670857 (next lowest: Phantom Pummelers, 1688.45809434935)

H | Phantom Pummelers | 1923.30132492753

I  | Wraith Wranglers | 2125.87216703498

J  | Demon Destroyers | 1915.0299245701 (Next lowest: Wraith Wranglers, 2162.49691339282)

K | Mundanifying Mystics | 2842.16499046146

L | Mundanifying Mystics | 2783.55221244497

M | Spectre Slayers | 1849.71986735069

N | Phantom Pummelers | 1784.8259008802

O | Wraith Wranglers | 2269.45361189797

P | Mundanifying Mystics | 2775.89249612121

Q | Wraith Wranglers | 1748.56167086623

R | Mundanifying Mystics | 2940.5652346428

S | Spectre Slayers | 1666.64380523907

T | Mundanifying Mystics | 2821.89307084084

U | Phantom Pummelers | 1792.3319145455

V | Demon Destroyers | 1472.45641559628 (Next lowest: Spectre Slayers, 1670.68911559919)

W | Demon Destroyers | 1833.86462523462 (Next lowest: Wraith Wranglers, 2229.1901870478)

So that's my provisional solution, and I will pay the extra 400sp one time fee so that Demon Destroyers can deal with ghosts D, G, J, V, W.

--Edit: whoops, missed most of this paragraph (other than the Demon Destroyers): 

"Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Poltergeist Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up."

will edit to fix! post edit: Actually my initial result is still compatible with that paragraph, it doesn't involve the Entity Eliminators, and only uses the Phantom Pummelers 3 times. --

Not very confident in my solution (see things to verify above), and if it is indeed this simple it is an easier problem than I expected.

further edit (late July 15 2024): haven't gotten around to checking those things and also my check of linearity, where I did check, binned the data and could be hiding all sorts of patterns.

Comment by simon on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-06-23T16:42:50.385Z · LW · GW

Huh, I was missing something then, yes. And retrospectively should have thought of it - 

it's literally just filling in the blanks for the light blue readout rectangle (which in a human-centric point of view, is arguably simpler to state than my more robotic perspective even if algorithmically more complex) and from that perspective the important thing is not some specific algorithm for grabbing the squares but just finding the pattern. I kind of feel like I failed a humanness test by not seeing that.

Comment by simon on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-06-23T04:50:24.905Z · LW · GW

Missed this comment chain before making my comment. My complaint is the most natural extrapolation here (as I assess it, unless I'm missing something) would go out of bounds. So either you have ambiguity about how to deal with the out of bounds, or you have a (in my view) less natural extrapolation.

E.g. "shift towards/away from the center" is less natural than "shift to the right/left", what would you do if it were already in the center for example?

Comment by simon on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-06-23T03:40:33.275Z · LW · GW

Problem 2 seems badly formulated because

The simplest rule explaining the 3 example input-output pairs would make the output corresponding to the test input depend on squares out of bounds of the test input. 

To fix you can have some rule like have the reflection axis be shifted from the center by one in the direction of the light blue "readout" rectangle (instead of fixed at one to the right from the center) or have the reflection axis be centered, and have a 2-square shift in a direction depending on which side of center is the readout rectangle (instead of in a fixed direction), but that seems strictly more complicated.

Alternatively, you could have some rule about wraparound, or e.g. using white squares if out of bounds, but what rule to use for out of bounds squares isn't determined from the example input-output pairs given.

Edit: whoops, see Fabien Roger's comment and my reply.

Comment by simon on D&D.Sci II: The Sorceror's Personal Shopper · 2024-06-21T04:17:10.079Z · LW · GW

It seems I missed this at the time, but since Lesswrong's sorting algorithm has now changed to bring it up the list for me, might as well try it:

X-Y chart of mana vs thaumometer looked interesting, splitting it into separate charts for each colour returned useful results for blue:

  • blue gives 2 diagonal lines, one for tools/weapons, one for jewelry - for tools/weapons it's pretty accurate, +-1, but optimistic by 21 or 23 for jewelry

and... that's basically it, the thaumometer seems relatively useless for the other colours.

But: 

green gives an even number of mana that looks uniformish in the range of 2-40

yellow always gives mana in the range of 18-21

red gives mana that can be really high, up to 96, but is not uniform, median 18

easy strategy: 

pendant of hope (blue, 77 thaumometer reading -> 54 or 56 mana expected), 34 gp

hammer of capability (blue, 35 thaumometer reading -> 34 or 36 mana expected), 35 gp

Plough of Plenty (yellow, 18-21 mana expected), 35 gp

Warhammer of Justice +1 (yellow, 18-21 mana expected), 41 gp

For a total of at least 124 mana at the cost of 145 gp, leaving 55 gp left over

Now, if I was doing this at the time, I would likely investigate further to check if, say, high red or green values can be predicted.

But, I admit I have some meta knowledge here - it was stated in discussion of difficulty of a recent problem, if I recall correctly, that this was one of the easier ones. So, I'm guessing there isn't a hidden decipherable pattern to predict mana values for the reds and greens.

Comment by simon on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T20:54:54.651Z · LW · GW

You don't need to justify - hail fellow D&Dsci player, I appreciate your competition and detailed writeup of your results, and I hope to see you in the next d&dsci!

Comment by simon on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T16:53:01.514Z · LW · GW

I liked the bonus objective myself, but maybe I'm biased about that...

As a someone who is also not a "data scientist" (but just plays one on lesswrong), I also don't know what exactly actual "data science" is, but I guess it's likely intended to mean using more advanced techniques?

(And if I can pull the same Truth from the void with less powerful tools, should that not mark me as more powerous in the Art? :P)

Perhaps, but don't make a virtue of not using the more powerful tools, the objective is to find the truth, not to find it with handicaps...

Speaking of which one thing that could help making things easier is aggregating data, eliminating information you think is irrelevant. For example, in this case, I assumed early on (without actually checking) that timing would likely be irrelevant, so aggregated data for ingredient combinations. As in, each tried ingredient combination gets only one row, with the numbers of different outcomes listed. You can do this by assigning a unique identifier to each ingredient combination (in this case you can just concatenate over the ingredient list), then counting the results for the different unique identifiers. Countifs has poor performance for large data sets, but you can sort using the identifiers then make a column that adds up the number of rows (or, the number of rows with a particular outcome) since the last change in the identifier, and then filter the rows for the last row before the change in the identifier (be wary of off-by-one errors). Then copy the result (values only) to a new sheet. 

This also reduces the number of rows, though not enormously in this case.

Of course, in this case, it turns out that timing was relevant, not for outcomes but only for the ingredient selection (so I would have had to reconsider this assumption to figure out the ingredient selection).

Comment by simon on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T16:22:39.158Z · LW · GW

I thought the flavour text was just right - I got it from the data, not the flavour text, and saw the flavour text as confirmation, as you intended.

 I was really quite surprised by how many players analyzed the data well enough to say "Barkskin potion requires Crushed Onyx and Ground Bone, Necromantic Power Potion requires Beech Bark and Oaken Twigs" and then went on to say "this sounds reasonable, I have no further questions."  (Maybe the onyx-necromancy connection is more D&D lore than most players knew?  But I thought that the bone-necromancy and bark-barkskin connections would be obvious even without that).

Illusion of transparency I think, hints are harder than anyone making them thinks.

When I looked at the ingredients for a "barkskin potion", as far as I knew at this point the ingredients were arbitrary, so in fact I don't recall finding it suspicious at all. Then later I remember looking at the ingredients for a "necromantic power potion" and thinking something like... "uh... maybe wood stuff is used for wands or something to do necromancy?". It was only when I explicitly made a list of the ingredients for each potion type, rather than looking at each potion individually, and could see that everything else make sense, that I realized the twist.

Comment by simon on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-11T01:03:34.556Z · LW · GW

Post-solution extra details:

Quantitative hypothesis for how the result is calculated:

"Magical charge":  number of ingredients that are in the specific list in the parent comment. I'm copying the "magically charged" terminology from Lorxus.

"Eligible" for a potion: Having the specific pair of ingredients for the potion listed in the grandparent comment, or at the top of Lorxus' comment.

  1.  Get Inert Glop or Magical Explosion with probability depending on the magical charge.
    1. 0-1 -> 100% chance of Inert Glop
    2. 2 -> 50% chance of Inert Glop
    3.  3 -> neither, skip to next step
    4. 4 -> 50% chance of Magical Explosion
    5. 5+ -> 100% chance of Magical Explosion
  2. If didn't get either of those, get Mutagenic Ooze at 1/2 chance if eligible for two potions or 2/3 chance if eligible for 3 potions. (presumably would be n/(n+1) chance for higher n).
  3. If didn't get that either, randomly get one of the potions the ingredients are eligible for, if any.
  4. If not eligible for any potions, get Acidic Slurry.

todo (will fill in below when I get results): figure out what's up with ingredient selection. 

edit after aphyer already posted the solution:

I didn't write up what I had found before aphyer posted the result, but I did notice the following:

  • hard 3-8 range in total ingredients
  • pairs of ingredients within selections being biased towards pairs that make potions
  • ingredient selections with 3 magical ingredients being much more common than ones with 2 or 4, and in turn more common than ones with 0-1 or 5+
    • and, this is robust when restricting to particular ingredients regardless of whether they are magical or not, though obviously with some bias as to how common 2 and 4 are
  • the order of commonness of ingredients holding actual magicalness constant is relatively similar restricted to 2 and 4 magic ingredient selections, though obviously whether is actually magical is a big influence here
  • I checked the distributions of total times a selection was chosen for different possible selections of ingredients, specifically for: each combination of total number of nonmagical ingredients and 0, 1 or 2 magical ingredients
    • I didn't get around to 3 and more magical ingredients, because I noticed that while for 0 and 1 magical ingredients the distributions looked Poisson-like (i.e. as would be expected if it were random, though in fact it wasn't entirely random), it definitely wasn't Poisson for the 2 ingredient case, and got sidetracked by trying to decompose into a Poisson distribution + extra distribution (and eventually by other "real life" stuff)
      • I did notice that this looked possibly like a randomish "explore" distribution which presumably worked the same as for the 0 and 1 ingredient case along with a non-random, or subset-restricted "exploit" distribution, though I didn't really verify this
Comment by simon on Why I don't believe in the placebo effect · 2024-06-10T15:13:40.818Z · LW · GW

Two different questions:

  1. Does receiving a placebo cause an actual physiological improvement?
  2. Does receiving a placebo cause the report of the patient's condition to improve?

The answer to the first question can be "no" while the second is still "yes", e.g. due to patient's self-reports of subjective conditions (pain, nausea, depression) being biased to what they think the listener wants to hear, particular if there's been some ritualized context (as discussed in kromem's comment) reinforcing that that's what they "should" say. 

Note that a similar effect could apply anywhere in the process where a subjective decision is made (e.g. if a doctor makes a subjective report on the patient's condition).

Comment by simon on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-10T08:01:42.231Z · LW · GW

Followup and actual ingredients to use:

Mutagenic Ooze is a failure mode that can happen if there are essential ingredients for multiple potions (can also get either potion or Inert Glop or Magical Explosion if eligible).

There are 12 "magical" ingredients. An ingredient is magical iff it is a product of a magical creature (i.e.: Angel Feather, Beholder Eye, Demon Claw, Dragon Scale, Dragon Spleen, Dragon Tongue, Dragon's Blood, Ectoplasm, Faerie Tears, Giant's Toe, Troll Blood, Vampire Fang).

Inert Glop is a possible outcome if there are 2 or fewer magical ingredients, and is guaranteed for 1 or fewer.

Magical Explosion is a possible outcome if there are 4 or more magical ingredients, and is guaranteed if there are 5 or more.

(Barksin and necromantic power seem "harder" since their essential ingredients are both nonmagical, requiring more additional ingredients to get the magicness up.)

Therefore: success should be guaranteed if you select the 2 essential ingredients for the desired potion, plus enough other ingredients to have exactly 3 magical ingredients in total, while avoiding selecting both essential ingredients for any other potion. For the ingredients available:

To get "Necromantic Power Potion" (actual Barkskin):

Beech Bark + Oaken Twigs + Demon Claw + Giant's Toe + either Troll Blood or Vampire Fang

To get "Barkskin Potion" (actual Necromantic Power):

Crushed Onyx  + Ground Bone + Demon Claw + Giant's Toe + either Troll Blood or Vampire Fang

To get Regeneration Potion:

Troll Blood + Vampire Fang + either Demon Claw or Giant's Toe

I expect I'm late to the party here on the solution... (edit: yes, see abstractapplic's very succinct, yet sufficient-to prove-knowledge comment, and Lorxus's much, much more detailed one)

Comment by simon on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T20:16:21.285Z · LW · GW

Maybe...

a love of ridiculous drama, a penchant for overcomplicated schemes, and a strong tendency to frequently disappear to conduct secretive 'archmage business'

Lying in order to craft a Necromantic Power Potion is certainly a bad sign, but still compatible with him being some other dark wizard type rather than the "Loathsome Lich" in particular. 

Regarding your proposal in second comment:  even if he is undead he might not need to drink it to tell what it is. Still, could be worth a shot! (now three different potion types to figure out how to make...)

Comment by simon on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-09T18:34:02.641Z · LW · GW

Observations so far:

Each potion has two essential ingredients (necessary, but not sufficient).

Barkskin Potion: Crushed Onyx and Ground Bone

Farsight Potion: Beholder Eye and Eye of Newt

Fire Breathing Potion: Dragon Spleen and Dragon's Blood

Fire Resist Potion: Crushed Ruby and Dragon Scale

Glibness Potion: Dragon Tongue and Powdered Silver

Growth Potion: Giant's Toe and Redwood Sap

Invisibility Potion: Crushed Diamond and Ectoplasm

Necromantic Power Potion: Beech Bark and Oaken Twigs 

Rage Potion: Badger Skull and Demon Claw

Regeneration Potion: Troll Blood and Vampire Fang

Most of these make sense. Except...

I strongly suspect that Archmage Anachronos is trying to trick me into getting him to brew a Necromantic Power Potion, and has swapped around the "Barkskin Potion" and "Necromantic Power Potion" output reports. In character, I should obtain information from other sources to confirm this. For the purpose of this problem, I will both try to solve the stated problem but also how to make a "Necromantic Power Potion" (i.e. an actual Barkskin Potion) to troll Anachronos.

Barksin and Necromantic Power also seem to be the two toughest potions to make, never succeeding with just a third ingredient added.

An attempt that includes essential ingredients for multiple potions does not necessarily fail, some ingredient combinations can produce multiple potion types.

There are four failure modes:

Acidic Slurry never happens if the essential ingredients of any potion are included, but not all attempts that lack essential ingredients are Acidic Slurry. So, I'm guessing Acidic Slurry is a residual if it doesn't have the required ingredients for any potion and doesn't hit any of the other failure modes.

Inert Glop tends to happen with low numbers of ingredients, initial guess is it happens if "not magical enough" in some sense, and guessing Magical Explosion is the opposite, dunno about Mutagenic Ooze yet.

Back to Barkskin and Necromantic Power:

Either one has lots of options to make reliably with just one non-available ingredient, but not (in the stats provided) avoiding all the unavailable ingredients. Each has some available options that produce the potion type most of the time, but with Inert Glop some of the time. There's one ingredient combination that has produced a "Barkskin Potion" the only time it has been tried, but it has the essential ingredients for a Growth Potion so would likely not reliably produce a "Barkskin Potion".

Many ingredient combos especially towards higher ingredient numbers haven't been tried yet, so plenty of room to find an actually reliable solution if i can figure out more about the mechanics. The research will continue...

Comment by simon on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures · 2024-05-17T06:29:59.715Z · LW · GW

Looks like architects apprenticed under B. Johnson or P. Stamatin always make impossible structures. 

Architects apprenticed under M. Escher, R. Penrose or T. Geisel never do.

Self-taught architects sometimes do and sometimes don't. It doesn't initially look promising to figure out who will or won't in this group - many cases of similar proposals sometimes succeeding and sometimes failing.

Fortunately, we do have 5 architects (D,E,G,H,K) apprenticed under B. Johnson or P. Stamatin, so we can pick the 4 of them likely to have the lowest cost proposals.

Cost appears to depend primarily (only?) on the materials used. 

dreams < wood < steel < glass < silver < nightmares

Throwing out architect G's glass and nightmares proposal as too expensive, that leaves us with D,E,H,K as the architect selections.

(edit: and yes, basically what everyone else said before me)

Comment by simon on Is being a trans woman (or just low-T) +20 IQ? · 2024-04-25T05:01:54.271Z · LW · GW

I always assumed that, since high IQ is correlated with high openness, the higher openness would be the cause of higher likelihood of becoming trans. 

(or, some more general situation where IQ is causing transness more than the other way around., e.g. high scores on IQ tests might be caused to some extent by earnestness/intensity etc., which could also cause more likelihood of becoming trans)

Comment by simon on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-04-08T04:27:04.804Z · LW · GW

So had some results I didn't feel were complete enough in to make a comment on (in the senses that subjectively I kept on feeling that there was some follow-on thing I should check to verify it or make sense of it), then got sidetracked by various stuff, including planning and now going on a trip sacred pilgrimage to see the eclipse. Anyway:

all of these results relate to the "main group" (non-fanged, 7-or-more segment turtles):

Everything seems to have some independent relation with weight (except nostril size afaik, but I didn't particularly test nostril size). When you control for other stuff, wrinkles and scars (especially scars) become less important relative to segments. 

The effect of abnormalities seems suspiciously close to 1 lb on average per abnormality (so, subjectively I think it might be 1). Adding abnormalities has an effect that looks like smoothing (in a biased manner so as to increase the average weight): the weight distribution peak gets spread out, but the outliers don't get proportionately spread out.  I had trouble finding a smoothing function* that I was satisfied exactly replicated the effect on the weight distribution however. This could be due to it not being a smoothing function, me not guessing the correct form, or me guessing the correct form and getting fooled by randomness into thinking it doesn't quite fit.

For green turtles with zero miscellaneous abnormalities, the distribution of scars looked somewhat close to a Poisson distribution. For the same turtles, the distribution of wrinkles on the other hand looked similar but kind of spread out a bit...like the effect of a smoothing function. And they both get spread out more with different colours. Hmm. Same spreading happens to some extent with segments as the colours change.

On the other hand, segment distribution seemed narrower than Poisson, even one with a shifted axis, and the abnormality distribution definitely looks nothing like Poisson (peaks at 0, diminishes far slower than a 0-peak Poisson).

Anyway, on the basis of not very much clear evidence but on seeming plausibility, some wild speculation:

I speculate there is a hidden variable, age. Effect of wrinkles and greyer colour (among non-fanged turtles) could be a proxy for age, and not a direct effect (names of those characteristics are also suggestive). Scars is likely a weaker proxy for age and also no direct effect. I guess segments likely do have some direct effect, while also being a (weak, like scars) proxy for age. Abnormalities clearly have a direct effect. Have not properly tested interactions between these supposed direct effects (age, segments, abnormalities), but if abnormality effect doesn't stack additively with the other effects, it would be harder for the 1-lb-per-abnormality size of the abnormality effect to be a non-coincidence.

So, further wild speculation: so age affect on weight could also be smoothing function (though, looks like high weight tail is thicker for greenish-gray - does that suggest it is not a smoothing function?

unknown: is there an inherent uncertainty in the weight given the characteristics, or does there merely appear to be because of the age proxies being unreliable indicators of age? is that even distinguishable? 

* by smoothing function I think I mean another random variable that you add to the first one, this other random variable takes on a range of values within a relatively narrow range. (e.g. uniform distribution from 0.0 to 2.0, or e.g. 50% chance of being 0.2, 50% chance of being 1.8).

Anyway, this all feels figure-outable even though I haven't figured it out yet. Some guesses where I throw out most of the above information (apart from prioritization of characteristics) because I haven't organized it to generate an estimator, and just guess ad hoc based on similar datapoints, plus Flint and Harold copied from above:

Abigail 21.6, Bertrand 19.3, Chartreuse 27.7, Dontanien 20.5, Espera 17.6, Flint 7.3, Gunther 28.9, Harold 20.4, Irene 26.1, Jacqueline 19.7

Comment by simon on Beauty and the Bets · 2024-03-31T20:19:34.239Z · LW · GW

Well, as you may see it's also is not helpful

My reasoning explicitly puts instrumental rationality ahead of epistemic. I hold this view precisely to the degree which I do in fact think it is helpful.

The extra category of a "fair bet" just adds another semantic disagreement between halfers and thirders. 

It's just a criterion by which to assess disagreements, not adding something more complicated to a model.

Regarding your remarks on these particular experiments:

If someone thinks the typical reward structure is some reward structure, then they'll by default guess that a proposed experiment has that reward structure.

This reasonably can be expected to apply to halfers or thirders. 

If you convince me that halfer reward structure is typical, I go halfer. (As previously stated since I favour the typical reward structure). To the extent that it's not what I would guess by default, that's precisely because I don't intuitively feel that it's typical and feel more that you are presenting a weird, atypical reward structure!

And thirder utilities are modified during the experiment. They are not just specified by a betting scheme, they go back and forth based on the knowledge state of the participant - behave the way probabilities are supposed to behave. And that's because they are partially probabilities - a result of incorrect factorization of E(X).

Probability is a mathematical concept with very specific properties. In my previous post I talk about it specifically and show that thirder probabilities for Sleeping Beauty are ill-defined.

I've previously shown that some of your previous posts incorrectly model the Thirder perspective, but I haven't carefully reviewed and critiqued all of your posts. Can you specify exactly what model of the Thirder viewpoint you are referencing here? (which will not only help me critique it but also help me determine what exactly you mean by the utilities changing in the first place, i.e. do you count Thirders evaluating the total utility of a possibility branch more highly when there are more of them as a "modification" or not (I would not consider this a "modification").

Comment by simon on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-03-31T19:35:38.086Z · LW · GW

updates:

In the fanged subset:

I didn't find anything that affects weight of fanged turtles independently of shell segment number. The apparent effect from wrinkles and scars appears to be mediated by shell segment number. Any non-shell-segment-number effects on weight are either subtle or confusingly change directions to mostly cancel out in the large scale statistics.

Using linear regression, if you force intercept=0, then you get a slope close to 0.5 (i.e. avg weight= 0.5*(number of shell segments) as suggested by qwertyasdef), and that's tempting to go for for the round number, but if you don't force intercept=0 then 0 intercept is well outside the error bars for the intercept (though it's still low, 0.376-0.545 at 95% confidence). If you don't force intercept=0 then the slope is more like 0.45 than 0.5. There is also a decent amount of variation which increases in a manner that could be plausibly linear with the number of shell segments (not really that great-looking a fit to a straight line with intercept 0 but plausibly close enough, I didn't do the math). Plausibly this could be modeled by each shell segment having a weight drawn from a distribution (average 0.45) and the total weight being the sum of the weights for each segment. If we assume some distribution in discrete 0.1lb increments, the per-segment variance looks to be roughly the amount supplied by a d4. 

So, I am now modeling fanged turtle weight as 0.5 base weight plus a contribution of 0.1*(1d4+2) for each segment. And no, I am not very confident that's anything to do with the real answer, but it seems plausible at least and seems to fit pretty well.

The sole fanged turtle among the Tyrant's pets, Flint, has a massive 14 shell segments and at that number of segments the cumulative probability of the weight being at or below the estimated value passes the 8/9 threshold at 7.3 lbs, so that's my estimate for Flint.

In the non-fanged, more than 6 segment main subset:

Shell segment number doesn't seem to be the dominant contributor here, all the numerical characteristics correlate with weight, will investigate further.

Abnormalities don't seem to affect or be affected by anything but weight. This is not only useful to know for separating abnormality-related and other effects on weight, but also implies (I think) that nothing is downstream of weight causally, since that would make weight act as a link for correlations with other things. 

This doesn't rule out the possibility of some other variable (e.g age) that other weight-related characteristics might be downstream of. More investigation to come. I'm now holding reading others' comments (beyond what I read at the time of my initial comment) until I have a more complete answer myself.

Comment by simon on D&D.Sci: The Mad Tyrant's Pet Turtles · 2024-03-30T07:16:33.253Z · LW · GW

Thanks abstractapplic! Initial observations:

There are multiple subpopulations, and at least some that are clearly disjoint.

The 3167 fanged turtles are all gray, and only fanged turtles are gray. Fanged turtles always weigh 8.6lb or less. Within the fanged turtles it seems shell segment number is pretty decently correlated with weight. wrinkles and scars have weaker correlations with weight but also correlate to shell segment number so not sure they have independent effect, will have to disentangle.

Non-fanged turtles always weigh 13.0 lbs or more. There are no turtles weighing between 8.6lb and 13.0lb.

The 5404 turtles with exactly 6 shell segments all have 0 wrinkles or anomalies, are green, have no fangs, have normal sized nostrils, and weigh exactly 20.4lb. None of that is unique to 6-shell-segment turtles, but that last bit makes guessing Harold's weight pretty easy.

Among the 21460 turtles that don't belong in either of those groups, all of the numerical characteristics correlate with weight, and notably number of abnormalities don't seem to correlate with other numerical characteristics so likely have some independent effect. Grayer colours tend to have higher weight, but also correlate with other things that seem to effect weight, so will have to disentangle.

edit: both qwertyasdef and Malentropic Gizmo identified these groups before my comment including 6-segment weight, and qwertyasdef also remarked on the correlation of shell segment number to weight among fanged turtles. 

Comment by simon on Beauty and the Bets · 2024-03-28T18:50:02.529Z · LW · GW

Throughout your comment you've been saying a phrase "thirders odds", apparently meaning odds 1:2, not specifying whether per awakening or per experiment. This is underspecified and confusing category which we should taboo. 

Yeah, that was sloppy language, though I do like to think more in terms of bets than you do. One of my ways of thinking about these sorts of issues is in terms of "fair bets" - each person thinks a bet with payoffs that align with their assumptions about utility is "fair", and a bet with payoffs that align with different assumptions about utility is "unfair".  Edit: to be clear, a "fair" bet for a person is one where the payoffs are such that the betting odds where they break even matches the probabilities that that person would assign.

I do not claim that. I say that in order to justify not betting differently, thirders have to retroactively change the utility of a bet already made:

I critique thirdism not for making different bets - as the first part of the post explains, the bets are the same, but for their utilities not actually behaving like utilities - constantly shifting back and forth during the experiment, including shifts backwards in time, in order to compensate for the fact that their probabilities are not behaving as probabilities - because they are not sound probabilities as explained in the previous post.

Wait, are you claiming that thirder Sleeping Beauty is supposed to always decline the initial per experiment bet - before the coin was tossed at 1:1 odds? This is wrong - both halfers and thirders are neutral towards such bets, though they appeal to different reasoning why.

OK, I was also being sloppy in the parts you are responding to.

Scenario 1: bet about a coin toss, nothing depending on the outcome (so payoff equal per coin toss outcome)

  • 1:1

Scenario 2: bet about a Sleeping Beauty coin toss, payoff equal per awakening

  • 2:1 

Scenario 3: bet about a Sleeping Beauty coin toss, payoff equal per coin toss outcome 

  • 1:1

It doesn't matter if it's agreed to before or after the experiment, as long as the payoffs work out that way. Betting within the experiment is one way for the payoffs to more naturally line up on a per-awakening basis, but it's only relevant (to bet choices) to the extent that it affects the payoffs.

Now, the conventional Thirder position (as I understand it) consistently applies equal utilities per awakening when considered from a position within the experiment.

I don't actually know what the Thirder position is supposed to be from a standpoint from before the experiment, but I see no contradiction in assigning equal utilities per awakening from the before-experiment perspective as well. 

As I see it, Thirders will only regret a bet (in the sense of considering it a bad choice to enter into ex ante given their current utilities) if you do some kind of bait and switch where you don't make it clear what the payoffs were going to be up front.

 But what I'm pointing at, is that thirdism naturally fails to develop an optimal strategy for per experiment bet in technicolor problem, falsly assuming that it's isomorphic to regular sleeping beauty.

Speculation; have you actually asked Thirders and Halfers to solve the problem? (while making clear the reward structure? - note that if you don't make clear what the reward structure is, Thirders are more likely to misunderstand the question asked if, as in this case, the reward structure is "fair" from the Halfer perspective and "unfair" from the Thirder perspective).

Technicolor and Rare Event problems highlight the issue that I explain in Utility Instability under Thirdism - in order to make optimal bets thirders need to constantly keep track of not only probability changes but also utility changes, because their model keeps shifting both of them back and forth and this can be very confusing. Halfers, on the other hand, just need to keep track of probability changes, because their utility are stable. Basically thirdism is strictly more complicated without any benefits and we can discard it on the grounds of Occam's razor, if we haven't already discarded it because of its theoretical unsoundness, explained in the previous post.

A Halfer has to discount their utility based on how many of them there are, a Thirder doesn't. It seems to me, on the contrary to your perspective, that Thirder utility is more stable.

Halfer model correctly highlights the rule how to determine which cases these are and how to develop the correct strategy for betting. Thirder model just keeps answering 1/3 as a broken clock.

... and I in my hasty reading and response I misread the conditions of the experiment (it's a "Halfer" reward structure again). (As I've mentioned before in a comment on another of your posts, I think Sleeping Beauty is unusually ambiguous so both Halfer and Thirder perspectives are viable. But, I lean toward the general perspectives of Thirders on other problems (e.g. SIA seems much more sensible (edit: in most situations) to me than SSA), so Thirderism seems more intuitive to me). 

Thirders can adapt to different reward structures but need to actually notice what the reward structure is! 

What do you still feel that is unresolved?

the things mentioned in this comment chain. Which actually doesn't feel like all that much, it feels like there's maybe one or two differences in philosophical assumptions that are creating this disagreement (though maybe we aren't getting at the key assumptions).

Edited to add: The criterion I mainly use to evaluate probability/utility splits is typical reward structure - you should assign probabilities/utilities such that a typical reward structure seems "fair", so you don't wind up having to adjust for different utilities when the rewards have the typical structure (you do have to adjust if the reward structure is atypical, and thus seems "unfair"). 

This results in me agreeing with SIA in a lot of cases. An example of an exception is Boltzmann brains. A typical reward structure would give no reward for correctly believing that you are a Boltzmann brain. So you should always bet in realistic bets as if you aren't a Boltzmann brain, and for this to be "fair", I set P=0 instead of SIA's U=0.  I find people believing silly things about Boltzmann brains like taking it to be evidence against a theory if that theory proposes that there exists a lot of Boltzmann brains. I think more acceptance of the setting of P=0 instead of U=0 here would cut that nonsense off. To be clear, normal SIA does handle this case fine (that a theory predicting Boltzmann brains is not evidence against it), but setting P=0 would make it more obvious to people's intuitions.

In the case of Sleeping Beauty, this is a highly artificial situation that has been pared down of context to the point that it's ambiguous what would be a typical reward structure, which is why I consider it ambiguous.

Comment by simon on Beauty and the Bets · 2024-03-27T17:45:40.506Z · LW · GW

The central point of the first half or so of this post  - that for E(X) = P(X)U(X) you could choose different P and U for the same E so bets can be decoupled from probabilities - is a good one.

I would put it this way: choices and consequences are in the territory*; probabilities and utilities are in the map.

Now, it could be that some probability/utility breakdowns are more sensible than others based on practical or aesthetic criteria, and in the next part of this post ("Utility Instability under Thirdism") you make an argument against thirderism based on one such criterion.

However, your claim that Thirder Sleeping Beauty would bet differently before and after the coin toss is not correct. If Sleeping Beauty is asked before the coin toss to bet based on the same reward structure as after the toss she will bet the same way in each case - i.e. Thirder Sleeping Beauty will bet Thirder odds even before the experiment starts, if the coin toss being bet on is particularly the one in this experiment and the reward structure is such that she will be rewarded equally (as assessed by her utility function) for correctness in each awakening.

Now, maybe you find this dependence on what the coin will be used for counterintuitive, but that depends on your own particular taste.

Then, the "technicolor sleeping beauty" part seems to make assumptions where the reward structure is such that it only matters whether you bet or not in a particular universe and not how many times you bet. This is a very "Halfer" assumption on reward structure, even though you are accepting Thirder odds in this case! Also, Thirders can adapt to such a reward structure as well, and follow the same strategy.  

Finally, on Rare Event Sleeping beauty, it seems to me that you are biting the bullet here to some extent to argue that this is not a reason to favour thirderism.

I think, we are fully justified to discard thirdism all together and simply move on, as we have resolved all the actual disagreements.

uh....no. But I do look forward to your next post anyway.

*edit: to be more correct, they're less far up the map stack than probability and utilities. Making this clarification just in case someone might think from that statement that I believe in free will (I don't).

Comment by simon on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI · 2024-02-18T07:19:21.233Z · LW · GW

I think there's a (kind of) loophole here, where we use an "abstract hypothetical" model of a hypothetical future, and optimize for consequences our actions for that hypothetical. Is this what you mean by "understood in abstract terms"? 

More or less, yes (in the case of engineering problems specifically, which I think is more real-world-oriented than most science AI).

The part I don't understand is why you're saying that this is "simpler"? It seems equally complex in kolmogorov complexity and computational complexity.

What I'm saying is "simpler" is that, given a problem that doesn't need to depend on the actual effects of the outputs on the future of the real world (where operating in a simulation is an example, though one that could become riskily close to the real world depending on the information taken into account by the simulation - it might not be a good idea to include highly detailed political risks of other humans thwarting construction in a fusion reactor construction simulation for example), it is simpler for the AI to solve that problem without taking into consideration the effects of the output on the future of the real world than it is to take into account the effects of the output on the future of the real world anyway.