Aaro Salosensaari's Shortform 2021-01-10T11:44:21.824Z


Comment by Aaro Salosensaari (aa-m-sa) on Sabien on "work-life" balance · 2021-05-27T20:13:09.898Z · LW · GW

>And as Duncan is getting at, employment has changed a lot since the term was coined and there's now a lot more opportunity for jobs and work to be aligned with a person's personal goals.

I can agree, I am skeptical that this ...integratedness(?) is actually a good thing for everyone. From point of view of the old "work vs life" people who valued the life part, it probably looks like them losing if what they get is "your work is supposed to integral part of what you choose to do with your life" but the options of where and what kind of work to do are not that different than they were some decades ago. And even the new^1 options present trade-offs.

Maybe there are some people whose true calling is to found a startup or develop mastery in some particular technology stack or manage projects that create profit for stockholders. However, if the job market environment is shaped by it so that every job expects an applicant whose life goals are integrally aligned to performing the job, it plausibly affects what kind of goals people think are thinkable when they think of their life and careers, because it certainly affects how they present themselves to the hiring committee or people with equivalent power.


Another point, concerning integration of work in ones life. I found myself thinking of the movie Tokyo Story (Tokyo Monogatari, 1953), which I saw maybe two years ago. While the story is not exactly about jobs, it explores how the modernity (the contemporary, post-WW2, kind  of modernity in particular) intersects with the Japanese society through the lenses of single Japanese family. The many characters in the film work various jobs: there is a son who is a physician (the kind of one who does visits and has a private practice), a daughter who is running a beauty saloon (a family business like setup; if the husband did something not affiliated with business, I forget what), and a daughter-in-law who is a menial clerk at a corporate business.

The part where this musing connects to anything, while writing the first part of the comment, I started to think about, what are the personal goals of the physician and the beauty business owner? If I recall, both of them are the kind of person who wants to strive and get forward and upward in their life in Tokyo (this leads to the one of conflicts in the film) and view their jobs integral to that goal. Their jobs are quite integrated to their life in concrete terms, both practice at their homes. Both kind of professions predate the work-life balance, probably. One could replace the beauty saloon with something a bit more traditional, like a restaurant or inn without much difference to their relevance to story, at the very least. The character with clearest difference between time off and time in work is the office clerk. Which actually connects to another plot point. I recommend the movie.

Maybe the big difference comes implied in the "good for the world in ways I care about" angle There is no crusader or activist, someone who seeks to make change in the world instead of making it well within it. Today the doctor would be likely to emphasize how he wants to help  people by being a good doctor, the family business would have a thing (natural beauty products that help the environment, powered by solar!), and the big corp would have mission, too. The owner of the corp, several echelons above, might be even serious about it. Nobody goes to found a start-up.

So, I guess my point is that there always have been people who don't view their work and non-work lives a fundamentally different kind of thing.

1: The newness might be debatable, though. I don't think starting a technology business because you have skills and ideas is something truly new in the US, I think both Edison and Tesla tried their hands at it and I have read Tesla's interviews which indicate he thought it was for the betterment of mankind? It would have been with the spirit of the times.

Comment by Aaro Salosensaari (aa-m-sa) on Chinese History · 2021-05-16T21:53:37.350Z · LW · GW

>The Church of England still has bishops that vote in the house of lords. 

That is argument for particular church-state relationship. The original claim spoke of entanglement (in the present tense!). For reference, the archbishop of Evangelical-Lutheran Church in Finland has always been appointed by whomever is the head of state since Gustav I Vasa embraced the Protestantism and the church was until recently an official state apparatus and to some extent still is. The Holy See has had negligible effect here since centuries, and some historians maintain that most of the time the influence tended to flow from the state to the Ev. Lut. church than other way around despite the overall symbiosis between the two.

The aspects of political power in such conflicts were not alien to Catholic cardinal Richeliu of France who financed Gustav II Adolf's war against the Catholic League in Germany while repressing the Huegenots at home.

It is very enlightening to read to the other responses below concerning the history of Confucianism, and I can be convinced China & Confucianism have very different history about the matters we (or I) often pattern-match to religion. And it makes sense that peculiarities of the Taiping rebellion or the CCP's current positions concerning Catholicism are motivated by them being in contact with European concepts of religion only relatively recently on historical timescales. Yet however:

In my understanding, the conflict between CCP and the Catholic church indicates that the party views Catholicism in terms of national identity and temporal power in ways both different and not so different how Catholicism was viewed in Protestant countries of 17th/18th century. The CCP apparently do not want Catholicism or specifically the Church of Rome's interpretation to have significant presence in the local thoughtspace, presumably in favor of something else which plausibly serves an analogous role (otherwise there would be no competition about that thoughtspace).

In this case, I find it likely that the parable about fish and water also applies to birds and air: there are both commonalities despite the differences, while water is no air, and the birds have more reason to differentiate the air from the ground. Maybe the Chinese are like more like to rockets in the vacuum of space, but that would take more explaining.

Writing out the argument how there is no entanglement and why the clarity arises (and why linking to Sun Tzu is supposed to back that argument) could possibly help here.

Consequently, the original remark and some of the subsequent discussion reads me to as "booing" all things that get called "religions" and cheering for the Chinese tradition as better for being not a religion.

Comment by Aaro Salosensaari (aa-m-sa) on What weird beliefs do you have? · 2021-05-16T12:51:31.887Z · LW · GW

I sort of believe in something like this, except without the magical bits. It motivates me to vote in elections and follow the laws also when there is no effective enforcement. Maybe it is a consequence of reading Pratchett's Discworld novels when I was in impressionable age. 

My mundane explanation (or rationalization) is a bit difficult to write, but I believe it is because of:

>It gets in people's minds.

When people believe something, it affects their behavior. Thus memetic phenomena can have real effects.

As an example I feel is related to this, I half-believe that believing in magical rationalizations[1] can also enable good societal outcomes, as long as enough people believe that also other people believe them, and it facilitates trust.

Have you read Joseph Conrad's Nostromo? It deals with how valuable things and what they do affect people's behavior, both on the societal scale (how corruption in imaginary South American state of Costaguano seeds more corruption) and personal scale.

[1] "if I vote in the national elections, it somehow makes difference, maybe because then more people like me are encourage to vote in elections" and "if I obey the law of not serving alcohol to underage people when there is no probably harm to them from it, or stop at the traffic signs at deserted street in midnight, the world somehow becomes a better place because world would be better place if more people followed the laws". 

Comment by Aaro Salosensaari (aa-m-sa) on What weird beliefs do you have? · 2021-05-16T12:15:42.172Z · LW · GW

I agree with Phil that this sounds very ... counterintuitive. Usually nothing is free, and even with free things there is consequences or some sort of externality.

However, I recently read an argument by a Finnish finance podcaster, who argued while the intuition might be true and government debt system probably is not sustainable and is going to have some kind of messup in long term, not participating may put your country at disadvantage compared to countries who take the "free" money and invest it, and thus have more assets when it all falls down.

Comment by Aaro Salosensaari (aa-m-sa) on Chinese History · 2021-05-12T23:09:36.009Z · LW · GW

I realize this is a 3mo old comment.

>Nor does China entangle religion with politics to the same extent you find in the Christian and Islamic worlds. This makes it easier to think about conflicts. I feel it produces a better understanding of political theory and strategy.

Does not entangle? I thought China is the only country of note around that enforces their version of Catholic church with Chinese characteristics (the translation used by Wikipedia is "Chinese Patriotic Catholic Church", apparently excommunicated by the pope in Rome). One can discuss how it compares to Church of England's historical past or more recently, the protestant skepticism about JFK's Catholicism, but it is kind of remarkable on its own right.

(edit. Thinking about the little bit I do know about Chinese history ... Taiping Rebellion?)

Comment by Aaro Salosensaari (aa-m-sa) on Predictive Coding has been Unified with Backpropagation · 2021-04-05T21:13:27.387Z · LW · GW

Sure, but statements like

>ANNs are built out of neurons. BNNs are built out of neurons too.

are imprecise and possibly imprecise enough to be also incorrect if it turns out that biological neurons do something different than perceptrons that is important. Without making the exact arguments and presenting evidence in what respects the perceptron model is useful, it is quite easy to bake in conclusions along the lines of "this algorithm for ANNs is a good model of biology" in the assumptions "both are built out of neurons".

Comment by Aaro Salosensaari (aa-m-sa) on Technological stagnation: Why I came around · 2021-01-26T21:22:26.193Z · LW · GW

Home delivery is way cheaper than it used to be.


I am going to push back a little on this one, and ask for context and numbers? 

As some of my older relatives commented when Wolt became popular here, before people started going to supermarkets, it was common for shops to have a delivery / errand boy (this would have been 1950s, and more prevalent before the WW2). It is one thing that strikes out reading biographies; teenage Harpo Marx dropped out from school and did odd jobs as an errand boy; they are ubiquitous part of the background in Anne Frank's diaries; and so on.

Maybe it was proportionally more expensive (relative to cost of purchase), but on the other hand, from the descriptions it looks like the deliveries were done by teenage/young men who were paid peanuts.

Comment by Aaro Salosensaari (aa-m-sa) on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-26T18:38:12.575Z · LW · GW

Thanks for writing this, the power to weight statistics are quite interesting. I have an another, longer reply with my own take (edit. comments about the graph, that is) in the works, but while writing it, I started to wonder about a tangential question:

I am saying that many common anti-short-timelines arguments are bogus. They need to do much more than just appeal to the complexity/mysteriousness/efficiency of the brain; they need to argue that some property X is both necessary for TAI and not about to be figured out for AI anytime soon, not even after the HBHL milestone is passed by several orders of magnitude.

I am not super familiar with the state of discussion and literature nowadays, but I was wondering what are these anti-short-timelines arguments that appeal to the  general complexity/mysteriousness and how common they are? Are they common in popular discourse, or common among people considered worth taking seriously?

Data efficiency, for example, is already a much more specific feature than handwave-y "human brain is so complex", and thus as you demonstrate, it becomes much easier to write a more convincing argument from data efficiency than mysterious complexity.

Comment by Aaro Salosensaari (aa-m-sa) on Aaro Salosensaari's Shortform · 2021-01-12T07:08:34.632Z · LW · GW

Eventually, yes, it is related to arguments concerning people. But I was curious about what aesthetics remain after I try to abstract away the messy details. 

Comment by Aaro Salosensaari (aa-m-sa) on Aaro Salosensaari's Shortform · 2021-01-12T07:05:34.821Z · LW · GW

>Is this a closed environment, that supports 100000 cell-generations?

Good question! No. I was envisioning it as a system where a constant population of 100 000 would be viable. (RA pipettes in a constant amount of nutritional fluid every day or something).  Now that you asked the question, it might make more sense to investigate this assumption more.

Comment by Aaro Salosensaari (aa-m-sa) on Aaro Salosensaari's Shortform · 2021-01-10T11:44:23.710Z · LW · GW

I have a small intuition pump I am working on, and thought maybe others would find it interesting.

Consider a habitat (say, a Petri dish) that in any given moment has maximum carrying capacity for supporting 100 000 units of life (say, cells), and two alternative scenarios.

Scenario A. Initial population of 2 cells grows exponentially, one cell dying but producing two descendants each generation. After the 16th generation, the habitat overflows, and all cells die in overpopulation. The population experienced a total of 262 142 units of flourishing.

Scenario B. More or less stable population of x cells (x << 100 000 units, say, approximately 20) continues for n generations, for total of x * n units of flourishing until the habitat meets its natural demise after n generations.

For some reason or other, I find the scenario B much more appealing even for relatively small numbers of n. For example, while n=100 000 (2 000 000 units of total flourishing) would be obviously better for utilitarian who cares about total equal sum of flourishing units (utilitons), I personally find already meager n=100 (x*n = 2000) sounding better than A. 

Maybe this is just because of me assuming that because n=100 is possible, also larger n sounds possible. Or maybe I am utiliton-blind and just think 100 > 17. Or maybe something else.

Background. In a recent discussion with $people, I tried to argue why I find a long term existence of a limited human population much more important than mere potential size of total experienced human flourishing or something more abstract. I have not tried to "figure in" more details, but somethings I have thought about adding in, is various probabilistic scenarios / uncertainty about total carrying capacity. No, I have not read (/remember reading) previous relevant LW posts, if you can think of something useful / relevant, please link it! 

Comment by Aaro Salosensaari (aa-m-sa) on How long does it take to become Gaussian? · 2020-12-10T10:46:57.119Z · LW · GW

I agree the non-IID result is quite surprising. Careful reading of the Berry-Esseen gives some insight on the limit behavior. In the IID case, the approximation error is bounded by constants / $\sqrt{n}$ (where constants are proportional to third moment / $\sigma^3$.

The not-IID generalization for n distinct distribution has the bound more or less sum of third moments divided by (sum of sigma^2)^(3/2) times (sum of third moments), which is surprisingly similar to IID special case. My reading of it suggests that if the sigmas / third moments of all n distributions are all bounded below / above some sigma / phi (which of course happens when you pick up any finite number of distributions by hand), the error is again diminishes at rate $1/\sqrt{n}$ if you squint your eyes.

So, I would guess for a series of not-IID distributions to sum into a Gaussian as poorly as possible (while Berry-Esseen still applies), one would have to pick a series of distributions with as wildly small variances and wildly large skews...? And getting rid of the assumptions of CLT/its generalizations gives that the theorem no longer applies.

Comment by Aaro Salosensaari (aa-m-sa) on Reason isn't magic · 2020-12-10T08:25:02.895Z · LW · GW

It gets worse. This isn't a randomly selected example - it's specifically selected as a case where reason would have a hard time noticing when and how it's making things worse.

Well, the history of bringing manioc to Africa is not the only example. Scientific understanding of human nutrition (alongside with disease) had several similar hiccups along the way, several which have been covered in SSC (can't remember the post titles where):

There was a time when Japanese army lost many lives to beriberi during Russo-Japanese war, thinking it was a transmissible disease, several decades [1] after the one of the first prominent Japanese young scholars with Western medical training discovered it was a deficiency related to nutrition with a classical trial setup in Japanese navy (however, he attributed it -- wrongly -- to deficiency of nitrogen). It took several decades to identify vitamin B1. [2]

Earlier, there was a time when scurvy was a problem in navies, including the British one, but then British navy (or rather, East India Company) realized citrus fruits were useful preventing scurvy, in 1617 [3]. Unfortunately it didn't catch on. Then they discovered it again with an actual trial and published the results, in 1740-50s [4]. Unfortunately it again didn't catch on, and the underlying theory was also as wrong as the others anyway. Finally, against the scientific consensus at the time, the usefulness of citrus was proven by a Navy read admiral in 1795 [5]. Unfortunately they still did not have proper theory why the citrus was supposed to work, so when the Navy managed to switch to using lime juice with minimal vitamin C content [6], then managed reason themselves out of use of citrus, and scurvy was determined as a result of food gone bad [7]. Thus Scott's Arctic expedition was ill-equipped to prevent scurvy, and soldiers in Gallipoli 1915 also suffered from scurvy.

Story of discovering vitamin D does not involve as dramatic failings, but prior to discovery of UV treatment and discovery of vitamin D, John Snow suggested the cause was adulterated food [8]. Of course, even today one can easily find internet debates about what is "correct" amount of vitamin D supplement if one has not sunlight in winter. Solving B12 deficiency induced anemia appears a true triumph of the science, as a Nobel prize was awarded for dietary recommendation for including liver in the diet [9] before B12 (present in liver) was identified [10].

Some may notice that we have now covered many of the significant vitamins in human diet. I have not even started with the story of Semmelweis.

And anyway, I dislike the whole premise of casting the matter about "being for reason" or "against reason". The issue with manioc, scurvy, beriberi, and hygiene was that people had unfortunate overconfidence in their per-existing model of reality. With sufficient overconfidence, rationalization or mere "rational speculation", they could explain how seemingly contradictory experimental results actually fitted in their model, and thus claim the nutrition-based explanations as an unscientific hogwash, until the actual workings of vitamins was discovered. (The article [1] is very instructive about rationalizations Japanese army could come up to dismiss Navy's apparent success with fighting beriberi: ships were easier to keep clean, beriberi was correlated with spending time on contact with damp ground, etc.)

While looking up food-borne diseases while writing this comment, I was reminded about BSE [11], which is hypothesized to cause vCJD in humans because humans thought it was a good idea to feed dead animals to cattle to improve nutrition (which I suppose it does, barring prion disease). I would view this as a failing from not having a full model what side-effects behavior suggested by the partial model would cause. 

On the positive side, sometimes the partial model works well enough: It appears that miasma theory of disease like cholera was the principal motivator for building modern sewage systems. While it is today obvious cholera is not caused by miasma, getting rid of smelly sewage in orderly fashion turned out to be a good idea nevertheless [12].

I am uncertain if I have any proper suggested conclusion, except for that, in general, mistakes of reason are possible and possibly fatal, and social dynamics may prevent proper corrective action for a long time. This is important to keep in mind when making decisions, especially novel and unprecedented, and when evaluating the consequences of action. (The consensus does not necessarily budge easily.)

Maybe a more specific conclusion could be: If one has only evidently partial scientific understanding of some issue, it is very possible acting on it can have unintended consequences. It may even not be obvious where the holes in the scientific understanding are. (Paraphrasing the response to Semmelweis: "We don't exactly know what causes childbed fever, it manifests in many different organs so it could be several different diseases, but the idea of invisible corpse particles that defy water and soap is simply laughable.")














Comment by Aaro Salosensaari (aa-m-sa) on Developmental Stages of GPTs · 2020-07-28T23:32:12.516Z · LW · GW

(Reply to gwern's comment but not only addressing gwern.)

Concerning the planning question:

I agree that next-token prediction is consistent with some sort of implicit planning of multiple tokens ahead. I would phrase it a bit differently. Also, "implicit" is doing lot of work here

(Please someone correct me if I say something obviously wrong or silly; I do not know how GPT-3 works, but I will try to say something about how it works after reading some sources [1].)

The bigger point about planning, though, is that the GPTs are getting feedback on one word at a time in isolation. It's hard for them to learn not to paint themselves into a corner.

To recap what I have thus far got from [1]: GPT-3-like transformers are trained by regimen where the loss function evaluates prediction error of the next word in the sequence given the previous word. However, I am less sure if one can say they do it in isolation. During training (by SGD I figure?), transformer decoder layers have (i) access to previous words in the sequence, and (ii) both attention and feedforward parts of each transformer layer has weights (that are being trained) to compute the output predictions. Also, (iii) the GPT transformer architecture considers all words in each training sequence, left to right, masking the future. And this is done for many meaningful Common Crawl sequences, though exact same sequences won't repeat.

So, it sounds a bit trivial that GPTs trained weights allow "implicit planning": if given a sequence of words w_1 to w_i-1 GPT would output word w for position i, this is because a trained GPT model (loosely speaking, abstracting away many details I don't understand) "dynamically encodes" many plausible "word paths" to word w, and [w_1 ... w_i-1] is such a path; by iteration, it also encodes many word paths from w to other words w', where some words are likelier to follow w than others. The representations in the stack of attention and feedforward layers allows it to generate text much more better than eg old good Markov chain. And "self-attending" to some higher-level representation that allows it generate text in particular prose style seems a lot of like a kind of plan. And GPT generating text that it used as input to it, to which it again can selectively "attend to", this all seems like as a kind of working memory, which will trigger self-attention mechanism to take certain paths, and so on.

I also want highlight oceainthemiddleofanisland's comment in other thread: Breaking complicated generation tasks into smaller chunks getting GPT to output intermediate text from initial input, which is then given as input to GPT to reprocess, enabling it finally to output desired output, sounds quite compatible to this view.

(On this note, I am not sure what to think of the role of human in the loop here, or in general, how it apparently requires non-trivial work to find a "working" prompt that seeds GPT obtain desired results for some particularly difficult tasks. That there are useful, rich world models "in there somewhere" in GPTs weights, but it is difficult to activate them? And are these difficulties because it is humans are bad at prompting GPT to generate text that accesses the good models, or because GPTs all-together model is not always so impressive as it easily turns into building answers based on gibberish models instead of the good ones, or maybe GPT having a bad internal model of humans attempting to use GPT? Gwern's example concerning bear attacks was interesting here.)

This would be "implicit planning". Is it "planning" enough? In any case, the discussion would be easier if we had a clearer definition what would constitute planning and what would not.

Finally, a specific response to gwerns comment.

During each forward pass, GPT-3 probably has plenty of slack computation going on as tokens will differ widely in their difficulty while GPT-3's feedforward remains a fixed-size computation; just as GPT-3 is always asking itself what sort of writer wrote the current text, so it can better imitate the language, style, format, structure, knowledge limitations or preferences* and even typos, it can ask what the human author is planning, the better to predict the next token. That it may be operating on its own past completions and there is no actual human author is irrelevant - because pretending really well to be an author who is planning equals being an author who is planning! (Watching how far GPT-3 can push this 'as if' imitation process is why I've begun thinking about mesa-optimizers and what 'sufficiently advanced imitation' may mean in terms of malevolent sub-agents created by the meta-learning outer agent.)

Using language how GPT-3 is "pretending" and "asking itself what a human author would do" can be maybe justified as metaphors, but I think it is a bit fuzzy and may obscure differences between what transformers do when we say they "plan" or "pretend", and what people would assume of beings who "plan" or "pretend". For example, using a word like "pretend" easily carries over an implication that there is something true, hidden, "unpretense" thinking or personality going on underneath. This appears quite unlikely given a fixed model, and generation mechanism that starts anew from each seed prompt. I would rather say that GPT has a model (is a model?) that is surprisingly good at natural language extrapolation and also, it is surprising at what can be achieved by extrapolation.

[1] , and in addition to skimming original OpenAI papers

Comment by Aaro Salosensaari (aa-m-sa) on Developmental Stages of GPTs · 2020-07-28T09:13:05.225Z · LW · GW

I contend it is not an *implementation* in a meaningful sense of the word. It is more a prose elaboration / expansion of the first generated bullet point list (an inaccurate one: "plan" mentions chopping vegetables, putting them in a fridge and cooking meat; prose version tells of chopping a set of vegetables, skips the fridge and then cooks beef, and then tells an irrelevant story where you go to sleep early and find it is a Sunday and no school).

Mind, substituting abstract category words with sensible more specific ones (vegetables -> carrots, onions and potatoes) is an impressive NLP task for an architecture where the behavior is not hard-coded in (because that's how some previous natural language generators worked), and even more impressive that it can produce the said expansion with a NLP input prompt, but hardly a useful implementation of a plan.

An improved experiment of "implementing plans" that could be within capabilities of GPT-3 or similar system: get GPT-3 to first output a plan of doing $a_thing and then the correct keystroke sequence input for UnReal World, DwarfFortress or Sims or some other similar simulated environment to produce it.

Comment by Aaro Salosensaari (aa-m-sa) on Self-sacrifice is a scarce resource · 2020-07-27T17:45:57.652Z · LW · GW

At the risk of stating very much the very obvious:

Trolley problem (or the fat man variant) is a wrong metaphor for near any ethical decision, anyway, as there are very few real life ethical dilemmas that are as visceral and require immediate action from very few limited set of options and whose consequences are nevertheless as clear.

Here is a couple of a bit more realistic matter of life and death. There are many stories (probably I could find factual accounts, but I am too lazy to search for sources) of soldiers who make the snap decision to save the lives of rest of their squad by jumping on a thrown hand grenade. Yet I doubt very few would cast much blame on anyone who had a chance of taking cover, and did that instead. (I wouldn't.) Moreover, the generals who demand prisoners (or agitate impressionable recruits) to clear a minefield without proper training or equipment are to be much frowned upon. And of course, there are untold possibilities to commit a dumb self-sacrifice that achieves nothing.

It general, a military force can not be very effective without people willing to put themselves in danger: if one finds oneself agreement with existence of states and armies, some amount of self-sacrifice follows naturally. For this reason, there are acts of valor who are viewed positively and to be cultivated. Yet, there are also common Western moral sentiments which dictate that it is questionable or outright wrong to require the unreasonable of other people, especially if the benefactors or the people doing the requiring are contributing relatively little themselves (sentiment demonstrated here by Blackadder Goes Forth). And in some cases drawing a judgement is generally considered difficult.

(What one should make of the Charge of the Light Brigade? I am not a military historian, but going by the popular account, the order to charge was stupid, negligent, mistake, or all of the three. Yet to some people, there is something inspirational in the foolishness of soldiers fulfilling the order; others would see such vies as abhorrent legend-building propaganda that devalues human life.)

In summary, I have not much concrete conclusions to offer, and anyway, details from one context (here, military) do not translate necessarily very well into other aspects of life. In some situations, (some amount of) self-sacrifice may be a good option, maybe even the best or only option for obtaining some outcomes, and it can be good thing to have around. On the other hand, in many situations it is wrong or contentious to require large sacrifices from others, and people who do so (including also extreme persuasion leading to voluntary self-sacrifice) are condemned as taking unjust advantage of others. Much depends on the framing.

As reader may notice, I am not arguing from any particular systematic theory of ethics, but rehashing my moral intuitions what is considered acceptable in West, assuming there is some signal of ethics in there.

Comment by Aaro Salosensaari (aa-m-sa) on Maths writer/cowritter needed: how you can't distinguish early exponential from early sigmoid · 2020-05-06T17:47:17.776Z · LW · GW

"Non-identifiability", by the way, is the search term that does the trick and finds something useful. Please see: Daly et al. [1], section 3. They study indentifiability characteristics of logistic sigmoid (that has rate r and goes from zero to carrying capacity K at t=0..30) via Fisher information matrix (FIM). Quote:

When measurements are taken at times t ≤ 10, the singular vector (which is also the eigenvector corresponding to the single non-zero eigenvalue of the FIM) is oriented in the direction of the growth rate r in parameter space. For t ≤ 10, the system is therefore sensitive to changes in the growth rate r, but largely insensitive to changes in the carrying capacity K. Conversely, for measurements taken at times t ≥ 20, the singular vector of the sensitivity matrix is oriented in the direction of the growth rate K[sic], and the system is sensitive to changes in the carrying capacity K but largely insensitive to changes in the growth rate r. Both these conclusions are physically intuitive.

Then Daly et al. proceed with MCMC scheme to numerically show that samples at different parts of time domain result in different identifiability of rate and carrying capacity parameters (Figure 3.)

[1] Daly, Aidan C., David Gavaghan, Jonathan Cooper, and Simon Tavener. “Inference-Based Assessment of Parameter Identifiability in Nonlinear Biological Models.” Journal of The Royal Society Interface 15, no. 144 (July 31, 2018): 20180318.


To clarify, because someone might miss it: this is not only a reply to shminux. Daly et al 2018 is (to some extent) the paper Stuart and others are looking for, at least if you are satisfied with their approach by looking what happens to effective Fisher information of logistic dynamics before and after inflection, supported by numerical inference methods showing that identifiability is difficult. (Their reference list also contains a couple of interesting articles about optimal design for logistic, harmonic models etc.)

Only thing missing that one might want AFAIK is a general analytical quantification of the amount of uncertainty, and comparison to specifically exponential (maybe along the lines Adam wrote there), and maybe writing it up in easy to digest format.

Comment by Aaro Salosensaari (aa-m-sa) on Maths writer/cowritter needed: how you can't distinguish early exponential from early sigmoid · 2020-05-06T17:40:18.061Z · LW · GW

Was momentarily confused what is k (sometimes denotes carrying capacity in the logistic population growth model), but apparently it is the step size (in numerical integrator)?

I have not enough expertise here to speak like an expert, but it seems that stiffness would be related in a roundabout way. It seems to describe difficulties of some numerical integrators with systems like this: the integrator can veer much off of true logistic curve with insufficiently small steps because the differential changes fast.

The phenomenon seems to be more about non-sensitivity than sensitivity of solution to parameters (or to be precise, non-identifiability of parameters): part of the solution before inflection seems to change very little to changes in "carrying capacity" (curve maximum) parameter.

Comment by Aaro Salosensaari (aa-m-sa) on Maths writer/cowritter needed: how you can't distinguish early exponential from early sigmoid · 2020-05-06T13:04:23.258Z · LW · GW

I was going to suggest that maybe it could be a known and published result in dynamical systems / population dynamics literature, but I am unable to find anything with Google, and textbooks I have at hand, while plenty mentions of logistic growth models, do not discuss prediction from partial data before inflection point.

On the other hand, it is fundamentally a variation on the themes of difficulty in model selection with partial data and dangers of extrapolation, which are common in many numerical textbooks.

If anyone wishes to flesh it out, I believe this behavior is not limited to trying to distinguish exponentials from logistic curves (or different logistics from each other), but also distinguishing different orders of growth from each other in general. With a judicious choice of data range and constants, it is not difficult to create a set of noisy points which could be either from a particular exponential or a particular quadratic curve. Quick example: (And if you limit data point range you are looking at to 0 to 2, it is quite impossible to say if a linear model wouldn't also be plausible.)

Comment by Aaro Salosensaari (aa-m-sa) on Against strong bayesianism · 2020-05-04T23:29:09.257Z · LW · GW

I am happy that you mention Gelman's book (I am studying it right now). I think lots of "naive strong bayesianists" would improve from a thoughtful study of the BDA book (there are lots of worked out demos and exercises available for it) and maybe some practical application of Bayesian modelling to some real-world statistical problems. The practice of "Bayesian way of life" of "updating my priors" sounds always a bit too easy in contrast to doing a genuine statistical inference.

For example, a couple of puzzles I am still myself unsure how to answer properly and with full confidence: Why one would be interested in doing stratified random sampling with your epidemiological study instead of naive "collect every data point that you see and then do a Bayesian update?" Or how multiple comparisons corrections for classical frequentist p-values map into Bayesian statistical framework? Does it matter for LWian Bayesianism if you are doing your practical statistical analyses with frequentist or Bayesian analysis tools (especially if many frequentist methods can be seen as clever approximations to full Bayesian model, see e.g. discussion of Kneser-Ney smoothing as ad hoc Pitman-Yor process inference here: ; similar relationship exists between k-means and EM-algorithm of Gaussian mixture model.) And if there is no difference, is the philosophical Bayesianism then actually that important -- or important at all -- for rationality?

Comment by Aaro Salosensaari (aa-m-sa) on Open & Welcome Thread—May 2020 · 2020-05-04T17:12:02.647Z · LW · GW

Howdy. I came across Ole Peters' "ergodicity economics" some time ago, and was interested to see what LW made of it. Apparently one set of skeptical journal club meetup notes:

I am not sure what to make of criticisms of Seattle meetups (they appear correct, but I am not sure if they are relevant; see my comment there).

Not planning to write a proper post, but here is an example blog post of Peters which I found illustrative and demonstrates why I think the "ergodicity way of thinking" might have something in it: . In summary, looking at the aggregate ensemble quantity such GDP per capita does not tell much what happens to individuals in the ensemble: the typical individual experienced growth in population in general is not related to GDP growth per capita (which may be obvious to a numerate person but not necessarily so, given the importance given to GDP in public discussion). And if one takes average of exponential growth rate, one obtains a measure (geometric mean income that they dub "DDP") known in economics literature, but originally derived otherwise.

But maybe this looks insightful to me because I am not that very well-versed in economics literature, so it would be nice to have some critical discussion about this.

Comment by Aaro Salosensaari (aa-m-sa) on Meetup Notes: Ole Peters on ergodicity · 2020-05-04T16:28:34.511Z · LW · GW

Peters' December 2019 Nature Physics paper ( ) provides some perspective on 0.6/1.5x coin flip example and other conclusions of the above discussion. (If Peters' claims have changed along the way, I wouldn't know.)

In my reading, there Peters' basic claim is not that ergodicity economics can solve the coin flip game in a way that classical economics can not (because it can, by switching to expected log wealth utility instead of expected wealth), but the utility functions as originally presented are a clutch that misinforms us on people's psychological motives in doing economic decisions. So, while the mathematics of many parts stays the same, the underlying phenomena can be more saliently reasoned about by looking at the individual growth rates in context of whether the associated wealth "process" is additive or multiplicative or something else. Thus there is also less need to use lingo where people may have an (innate, weirdly) "risk-averse utility function" (as compared to some other less risk-averse theoretical utility function).