May 2021 newsletter 2021-06-11T14:13:18.485Z
"Decision Transformer" (Tool AIs are secret Agent AIs) 2021-06-09T01:06:57.937Z
April 2021 newsletter 2021-06-03T15:13:29.138Z
gwern's Shortform 2021-04-24T21:39:14.128Z
March 2021 newsletter 2021-04-06T14:06:20.198Z
February 2021 newsletter 2021-03-13T14:57:54.645Z
January 2021 newsletter 2021-02-04T20:12:39.555Z
December 2020 links 2021-01-10T17:21:40.756Z
November 2020 newsletter 2020-12-03T22:47:16.917Z
October 2020 newsletter 2020-11-01T21:38:46.795Z
/r/MLScaling: new subreddit for NN scaling research/discussion 2020-10-30T20:50:25.973Z
"Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} 2020-10-29T01:45:30.666Z
September 2020 newsletter 2020-10-26T13:38:51.107Z
August 2020 newsletter 2020-09-01T21:04:58.299Z
July 2020 newsletter 2020-08-20T16:39:27.202Z
June 2020 newsletter 2020-07-02T14:19:08.696Z
GPT-3 Fiction Samples 2020-06-25T16:12:05.422Z
May newsletter (w/GPT-3 commentary) 2020-06-02T15:40:37.155Z
OpenAI announces GPT-3 2020-05-29T01:49:04.855Z
"AI and Efficiency", OA (44✕ improvement in CNNs since 2012) 2020-05-05T16:32:20.335Z
April 2020 newsletter 2020-05-01T20:47:44.867Z
March 2020 newsletter 2020-04-03T02:16:02.871Z
February 2020 newsletter 2020-03-04T19:05:16.079Z
January 2020 newsletter 2020-01-31T18:04:21.945Z
Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal 2020-01-08T22:20:20.290Z
Dec 2019 newsletter 2020-01-04T20:48:48.788Z
Nov 2019 newsletter 2019-12-02T21:16:04.846Z
October 2019 newsletter 2019-11-14T20:26:34.236Z
September 2019 newsletter 2019-10-04T16:44:43.147Z
"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z
August 2019 newsletter (popups.js demo) 2019-09-01T17:52:01.011Z
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z
July 2019 newsletter 2019-08-01T16:19:59.893Z
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z
June 2019 newsletter 2019-07-01T14:35:49.507Z
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z
On Having Enough Socks 2019-06-13T15:15:21.946Z
May newsletter 2019-06-01T17:25:11.740Z
"One Man's Modus Ponens Is Another Man's Modus Tollens" 2019-05-17T22:03:59.458Z
April 2019 newsletter 2019-05-01T14:43:18.952Z
Recent updates to (2017–2019) 2019-04-28T20:18:27.083Z
"Everything is Correlated": An Anthology of the Psychology Debate 2019-04-27T13:48:05.240Z
March 2019 newsletter 2019-04-02T14:17:38.032Z
February newsletter 2019-03-02T22:42:09.490Z
'This Waifu Does Not Exist': 100,000 StyleGAN & GPT-2 samples 2019-03-01T04:29:16.529Z
January 2019 newsletter 2019-02-04T15:53:42.553Z
"Forecasting Transformative AI: An Expert Survey", Gruetzemacher et al 2019 2019-01-27T02:34:57.214Z
"AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] 2019-01-24T20:49:01.350Z
Visualizing the power of multiple step selection processes in JS: Galton's bean machine 2019-01-12T17:58:34.584Z
Littlewood's Law and the Global Media 2019-01-12T17:46:09.753Z


Comment by gwern on What are the gears of gluten sensitivity? · 2021-06-09T02:22:27.298Z · LW · GW

If you are concerned about gluten sensitivity, why not directly test for the antibodies or celiac-related genetic variants (eg 23andMe)? You can do both at home via mail for like $200 total. That information sounds much more dispositive than reducing gluten and maybe observing some effect, and given the long-term harms of problems like celiac, this is not a problem one wants to cheap out on solving.

Comment by gwern on "Decision Transformer" (Tool AIs are secret Agent AIs) · 2021-06-09T01:07:49.698Z · LW · GW

Rewards need not be written in natural language as crudely as "REWARD: +10 UTILONS". Something to think about as you continue to write text online.

And what of the dead? I own that I thought of myself, at times, almost as dead. Are they not locked below ground in chambers smaller than mine was, in their millions of millions? There is no category of human activity in which the dead do not outnumber the living many times over. Most beautiful children are dead. Most soldiers, most cowards. The fairest women and the most learned men – all are dead. Their bodies repose in caskets, in sarcophagi, beneath arches of rude stone, everywhere under the earth. Their spirits haunt our minds, ears pressed to the bones of our foreheads. Who can say how intently they listen as we speak, or for what word?

Comment by gwern on Curated conversations with brilliant rationalists · 2021-06-01T16:00:33.387Z · LW · GW

IMO, that's shockingly cheap, and there's little reason to not do transcripts for any podcast which has a listening audience larger than "your gf and your dog" and pretensions to being more than tissue-level entertainment to be discarded after use. If a podcast is worth taking hours to do and expecting hundreds/thousands of listeners to sit through spending man-hours apiece and trying to advertise or spread it in any way, then it's almost certainly also then worth $100 to transcribe it. A transcript buys you search-engine visibility (as well as easy search/quotation in general), foreign audiences (reading is a lot easier than listening), the ability to annotate with links/references, and a lot of native listeners who don't want to sit through it in realtime (reading is also vastly faster than listening). Notice how much more often you see Econlog, 80k Hours, or Tyler Cowen's Conversations linked than many other podcasts, which decline to provide transcripts, and whose episodes instantly disappear*.

* I'm looking at you, A16Z. Not transcribing your podcasts is ludicrous when you are one of the largest VC firms in the world and attempting to remake yourself into an all-services VC empire based in considerable part on contemt marketing.

Comment by gwern on The EMH is False - Specific Strong Evidence · 2021-05-30T16:36:34.129Z · LW · GW

There are currently high return trades (5% a month at least, possibly more) with extremely low risk (you can lose 1-2% max, probably less depending on execution).

Worth noting that a new Metaculus market estimates ~50% chance of Polymarket being a counterparty risk in some sense 2021-2022.

Comment by gwern on Article on IQ: The Inappropriately Excluded · 2021-05-29T22:11:38.381Z · LW · GW

The sample consisted of mid-level leaders from multinational private-sector companies.

This sort of pre-filtered sample suffers from issues like Berkson's paradox. For example, for those managers who have IQ>120, why are they underperforming? Perhaps for lack of leadership qualities, which they make up for on intelligence. On the flip side, for managers who have unimpressive IQs (as low as <100), why are they so successful? This is why longitudinal samples like SMPY are so much more useful when you want to talk about what high IQs are or are not good for. If you run this sort of cross-sectional design, you find things like "Conscientiousness is inversely correlated with intelligence" (it's not).

Comment by gwern on Re: Fierce Nerds · 2021-05-22T01:42:16.256Z · LW · GW

"Fierce nerd" sounds a bit like rediscovering Eysenck's paradigm of genius: intelligence, energy, and Psychoticism (essentially, low Agreeableness).

Comment by gwern on Get your gun license · 2021-05-21T20:52:04.424Z · LW · GW

Considering how frequent mental issues are around here, this post seems to buy entirely the wrong kinds of optionality.

EDIT: oh look what's on the main page a day later

Comment by gwern on Open and Welcome Thread - May 2021 · 2021-05-20T19:18:47.777Z · LW · GW

The #lesswrong IRC channel has moved to Libera due to drama.

Some background links:

Comment by gwern on What will 2040 probably look like assuming no singularity? · 2021-05-18T00:05:36.893Z · LW · GW

The 3 babies from He Jiankui will be adults by then, definitely; one might quibble about how 'designer' they are, but most people count selection as 'designer' and GenPred claims to have at least one baby so far selected on their medical PGSes (unclear if they did any EDU/IQ PGSes in any way, but as I've always pointed out, because of the good genetic correlations of those with many diseases, any selection on complex diseases will naturally also boost those).

Comment by gwern on How to determine the value of optionality? · 2021-05-17T02:30:50.227Z · LW · GW

The value of optionality is defined by drawing out the decision tree for the scenarios with and without the option, doing backwards induction for the optimal strategy and estimating the value of each. (In financial option theory, you calculate the price of a literal option by simulating out all of the possible price trajectories and how you would respond to them, to figure out what would be a too cheap or too expensive price.) Because scenarios can be arbitrarily complex, no general answer is possible. If an option wouldn't be used at any state of the world, it might have a value of $0, for example, and this is automatically taken into account: the backwards induction will produce a policy that never invokes the option, and the difference in the value of the 2 scenarios = $0 option value.

For the house scenario, you would, say, define scenarios where each month you can sell/rent/live-in-it and there are random shocks (like Airbnb prices going up/down or housing prices going up/down, I guess), and a horizon of like 10 years and then do backwards induction to understand the value of being able to exploit decreases in Airbnb prices or to shelter in your house from Airbnb price surges.

Comment by gwern on Agency in Conway’s Game of Life · 2021-05-13T14:23:46.227Z · LW · GW

OP said I can initialize a large chunk as I like (which I initialize to be empty aside from my constructors to avoid interfering with placing the pixels), and then the rest might be randomly or arbitrarily initialized, which is why I brought up the wall of still-life eaters to seal yourself off from anything that might then disrupt it. If his specific values don't give me enough space, but larger values do, then that's an answer to the general question as nothing hinges on the specific values.

Comment by gwern on Agency in Conway’s Game of Life · 2021-05-13T02:38:54.034Z · LW · GW

My immediate impulse is to say that it ought to be possible to create the smiley face, and that it wouldn't be that hard for a good Life hacker to devise it.

I'd imagine it to go something like this. Starting from a Turing machine or simpler, you could program it to place arbitrary 'pixels': either by finding a glider-like construct which terminates at specific distances into a still, so the constructor can crawl along an x/y axis, shooting off the terminating-glider to create stable pixels in a pre-programmed pattern. (If that doesn't exist, then one could use two constructors crawling along the x/y axises, shooting off gliders intended to collide, with the delays properly pre-programmed.) The constructor then terminates in a stable still life; this guarantees perpetual stability of the finished smiley face. If one wants to specify a more dynamic environment for realism, then the constructor can also 'wall off' the face using still blocks. Once that's done, nothing from the outside can possibly affect it, and it's internally stable, so the pattern is then eternal.

Comment by gwern on Self-Predicting Markets · 2021-05-12T17:37:47.712Z · LW · GW

To update on this: Hertz stock is now worth $5-8 as it comes out of bankruptcy. I hope OP didn't short it, because he would've lost his shorts based on his belief that EMH is false and he's smarter than the markets.

Comment by gwern on Challenge: know everything that the best go bot knows about go · 2021-05-12T03:48:50.519Z · LW · GW

An even more pointed example: chess endgame tables. What does it mean to 'fully understand' it beyond understanding the algorithms which construct them, and is it a reasonable goal to attempt to play chess endgames as well as the tables?

Comment by gwern on [link] If something seems unusually hard for you, see if you're missing a minor insight · 2021-05-05T20:07:19.704Z · LW · GW

This reminds me of pg:

If you think something's supposed to hurt, you're less likely to notice if you're doing it wrong. That about sums up my experience of graduate school.

"How To Do What You Love"

(Of course, there's a certain aspect of learned-helplessness here: because so many things are terrible, people often assume that something is just another broken malicious tool or workflow, when it's quite the opposite.)

But really the single most important way to learn to use a search engine is this: Know people who are better at using search engines than you, and when you get stuck ask them for help and ask them to explain what they did and why they did it, and remember that for next time.

And if you're the good one, make a list of case-studies.

Comment by gwern on interpreting GPT: the logit lens · 2021-05-01T02:17:37.213Z · LW · GW


Comment by gwern on gwern's Shortform · 2021-04-24T22:09:28.493Z · LW · GW

2-of-2 escrow: what is the exploding Nash equilibrium? Did it really originate with NashX? I've been looking for the history & real name of this concept for years now and have failed to refind it. Anyone?

Comment by gwern on gwern's Shortform · 2021-04-24T21:47:50.065Z · LW · GW

Humanities satirical traditions: I always enjoy the CS/ML/math/statistics satire in the annual SIGBOVIK and Ig Nobels; physics has Arxiv April Fools papers (like "On the Impossibility of Supersized Machines") & journals like Special Topics; and medicine has the BMJ Christmas issue, of course.

What are the equivalents in the humanities, like sociology or literature? (I asked a month ago on Twitter and got zero suggestions...)

Comment by gwern on gwern's Shortform · 2021-04-24T21:39:16.652Z · LW · GW

Normalization-free Bayes: I was musing on Twitter about what the simplest possible still-correct computable demonstration of Bayesian inference is, that even a middle-schooler could implement & understand. My best candidate so far is ABC Bayesian inference*: simulation + rejection, along with the 'possible worlds' interpretation.

Someone noted that rejection sampling is simple but needs normalization steps, which adds complexity back. I recalled that somewhere on LW many years ago someone had a comment about a Bayesian interpretation where you don't need to renormalize after every likelihood computation, and every hypothesis just decreases at different rates; as strange as it sounds, it's apparently formally equivalent. I thought it was by Wei Dai, but I can't seem to refind it because queries like 'Wei Dai Bayesian decrease' obviously pull up way too many hits, it's probably buried in an Open Thread somewhere, my Twitter didn't help, and Wei Dai didn't recall it at all when I asked him. Does anyone remember this?

* I've made a point of using ABC in some analyses simply because it amuses me that something so simple still works, even when I'm sure I could've found a much faster MCMC or VI solution with some more work.

Incidentally, I'm wondering if the ABC simplification can be taken further to cover subjective Bayesian decision theory as well: if you have sets of possible worlds/hypotheses, let's say discrete for convenience, and you do only penalty updates as rejection sampling of worlds that don't match the current observation (like AIXI), can you then implement decision theory normally by defining a loss function and maximizing over it? In which case you can get Bayesian decision theory without probabilities, calculus, MCM, VI, etc or anything more complicated than a list of numbers and a few computational primitives like coinflip().

Comment by gwern on Gradations of Inner Alignment Obstacles · 2021-04-21T16:52:07.599Z · LW · GW

I claim that if we're clever enough, we can construct a hypothetical training regime T' which trains the NN to do nearly or exactly the same thing on T, but which injects malign behavior on some different examples. (Someone told me that this is actually an existing area of study; but, I haven't been able to find it yet.)

I assume they're referring to data poisoning backdoor attacks like or or

Comment by gwern on How can we increase the frequency of rare insights? · 2021-04-20T14:49:21.273Z · LW · GW

Have you looked at the "incubation effect"?

Comment by gwern on Parameter count of ML systems through time? · 2021-04-19T16:36:53.384Z · LW · GW

It's not the numerical precision but the model architecture being sparse such that you only active a few experts at runtime, and only a small fraction of the model runs for each input. It may be 1.3t parameters or whatever, but then at runtime, only, I dunno, 20b parameters actually compute anything. This cheapness of forward passes/inferencing is the big selling point of MoE for training and deployment: that you don't actually ever run 1.3t parameters. But it's hard for parameters which don't run to contribute anything to the final result, whereas in GPT-3, pretty much all of those 175b parameters can participate in each input. It's much clearer if you think about comparing them in terms of FLOPS at runtime, rather than static parameter counts. GShard/Switch is just doing a lot less.

(I also think that the scaling curves and comparisons hint at Switch learning qualitatively worse things, and the modularity encouraging more redundancy and memorization-heavy approaches, which impedes any deeper abstractions or meta-learning-like capabilities that a deep dense model might learn. But this point is much more speculative, and not necessarily something that, say, translation researchers would care too much about.)

This point about runtime also holds for those chonky embeddings people sometimes bring up as examples of 'models with billions of parameters': sure, you may have a text or category embedding which has billions of 'parameters', but for any specific input, only a handful of those parameters actually do anything.

Comment by gwern on LessWrong help desk - free paper downloads and more · 2021-04-19T14:41:13.792Z · LW · GW

If Reddit falls through, email me and I can order a scan for you. (Might want to delete your duplicate comments here too.) EDIT: ordered a scan

Comment by gwern on Parameter count of ML systems through time? · 2021-04-19T14:07:07.574Z · LW · GW

You should probably also be tracking kind of parameter. I see you have Switch and Gshard in there, but, as you can see in how they are visibly outliers, MoEs (and embeddings) use much weaker 'parameters', as it were, than dense models like GPT-3 or Turing-NLG. Plotting by FLOPS would help correct for this - perhaps we need graphs like training-FLOPS per parameter? That would also help correct for comparisons across methods, like to older architectures such as SVMs. (Unfortunately, this still obscures that the key thing about Transformers is better scaling laws than RNNs or n-grams etc, where the high FLOPS-per-parameter translates into better curves...)

Comment by gwern on Fun with +12 OOMs of Compute · 2021-04-18T18:18:14.660Z · LW · GW

Comment by gwern on March 2021 newsletter · 2021-04-06T20:31:56.806Z · LW · GW

"'Nash equilibrium strategy' is not necessarily synonymous to 'optimal play'. A Nash equilibrium can define an optimum, but only as a defensive strategy against stiff competition. More specifically: Nash equilibria are hardly ever maximally exploitive. A Nash equilibrium strategy guards against any possible competition including the fiercest, and thereby tends to fail taking advantage of sub-optimum strategies followed by competitors. Achieving maximally exploitive play generally requires deviating from the Nash strategy, and allowing for defensive leaks in one's own strategy."

Comment by gwern on 2020 AI Alignment Literature Review and Charity Comparison · 2021-04-02T22:58:29.757Z · LW · GW

That's interesting. I did see YC listed as a major funding source, but given Sam Altman's listed loans/donations, I assumed, because YC has little or nothing to do with Musk, that YC's interest was Altman, Paul Graham, or just YC collectively. I hadn't seen anything at all about YC being used as a cutout for Musk. So assuming the Guardian didn't screw up its understanding of the finances there completely (the media is constantly making mistakes in reporting on finances and charities in particular, but this seems pretty detailed and specific and hard to get wrong), I agree that that confirms Musk did donate money to get OA started and it was a meaningful sum.

But it still does not seem that Musk donated the majority or even plurality of OA donations, much less the $1b constantly quoted (or any large fraction of the $1b collective pledge, per ESRogs).

Comment by gwern on The best frequently don't rise to the top · 2021-03-26T14:59:17.330Z · LW · GW

One of the most interesting media experiments I know of is the Yahoo Media experiments:

  1. "Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market", Salganik et al 2006:

    We investigated this paradox experimentally, by creating an artificial ‘‘music market’’ in which 14,341 participants downloaded previously unknown songs either with or without knowledge of previous participants’ choices. Increasing the strength of social influence increased both inequality and unpredictability of success. Success was also only partly determined by quality: The best songs rarely did poorly, and the worst rarely did well, but any other result was possible.

  2. "Web-Based Experiments for the Study of Collective Social Dynamics in Cultural Markets", Salganik & Watts 2009:

    Using a ‘‘multiple-worlds’’ experimental design, we are able to isolate the causal effect of an individual-level mechanism on collective social outcomes. We employ this design in a Web-based experiment in which 2,930 participants listened to, rated, and downloaded 48 songs by up-and-coming bands. Surprisingly, despite relatively large differences in the demographics, behavior, and preferences of participants, the experimental results at both the individual and collective levels were similar to those found in Salganik, Dodds, and Watts (2006)...A comparison between Experiments 1 and 2 reveals a different pattern. In these experiments, there was little change at the song level; the correlation between average market rank in the social influence worlds of Experiments 1 and 2 was 0.93.

This is analogous to test-retest error: if you run a media market with the same authors, and same creative works, how often do you get the same results? Forget completely any question about how much popularity correlates with 'quality' - does popularity even correlate with itself consistently? If you ran the world several times, how much would the same songs float to the top?

The most relevant rank correlation they seem to report is rho=0.93*. That may seem high, but the more datapoints there are, the higher the necessary correlation soars to give the results you want.

A rho=0.93 implies that if you had a million songs competing in a popularity contest, the #1 popular song in our world would probably be closer to only the #35,000th most popular song in a parallel world's contest as it regresses to the mean (1000000 - (500000 + (500000 * 0.93))). (As I noted the other day, even in very small samples you need extremely high correlations to guarantee double-maxes or similar properties, once you move beyond means; our intuitions don't realize just what an extreme demand we make when we assume that, say, J.K. Rowling must be a very popular successful writer in most worlds simply because she's a billionaire in this world, despite how many millions of people are writing fiction and competing with her. Realistically, she would be a minor but respected author who might or might not've finished out her HP series as sales flagged for multi-volume series; sort of like her crime novels published pseudonymously.)

Then toss in the undoubtedly <<1 correlation between popularity and any 'quality'... It is indeed no surprise that, out of the millions and millions of chefs over time, the best chefs in the world are not the most popular YouTube chefs. Another example of 'the tails comes apart' at the extremes and why order statistics is counterintuitive.

* They also report a rho=0.52 from some other experiments, which are arguably now more relevant than the 0.93 estimate. Obviously, if you use 0.52 instead, my point gets much much stronger: then, out of a million, you regress from #1 to #240,000!

Comment by gwern on The EMH is False - Specific Strong Evidence · 2021-03-25T15:27:13.086Z · LW · GW

I knew someone was going to ask that. Yes, it's impure indexing, it's true. The reason is the returns to date on the whole-world indexes have been lower, the expense is a bit higher, and after thinking about it, I decided that I do have a small opinion about the US overperforming (mostly due to tech/AI and a general sense that people persistently underestimate the US economically) and feel pessimistic about the rest of the world. Check back in 20 years to see how that decision worked out...

Comment by gwern on Against evolution as an analogy for how humans will create AGI · 2021-03-25T00:31:30.817Z · LW · GW

Further reading:

Comment by gwern on Against evolution as an analogy for how humans will create AGI · 2021-03-24T15:55:32.256Z · LW · GW

As described above, I expect AGI to be a learning algorithm—for example, it should be able to read a book and then have a better understanding of the subject matter. Every learning algorithm you’ve ever heard of—ConvNets, PPO, TD learning, etc. etc.—was directly invented, understood, and programmed by humans. None of them were discovered by an automated search over a space of algorithms. Thus we get a presumption that AGI will also be directly invented, understood, and programmed by humans.

For a post criticizing the use of evolution for end to end ML, this post seems to be pretty strawmanish and generally devoid of any grappling with the Bitter Lesson, end-to-end principle, Clune's arguments for generativity and AI-GAs program to soup up self-play for goal generation/curriculum learning, or any actual research on evolving better optimizers, DRL, or SGD itself... Where's Schmidhuber, Metz, or AutoML-Zero? Are we really going to dismiss PBT evolving populations of agents in the AlphaLeague just 'tweaking a few human-legible hyperparameters'? Why isn't Co-Reyes et al 2021 an example of evolutionary search inventing TD-learning which you claim is absurd and the sort of thing that has never happened?

Comment by gwern on Thirty-three randomly selected bioethics papers · 2021-03-24T15:31:15.436Z · LW · GW

This was exactly what I expected. The problem with the field of bioethics has never been the papers being 100% awful, but how it operates in the real world, the asymmetry of interventions, and what its most consequential effects have been. I would have thought 2020 made this painfully clear. (That is, my grandmother did not die of coronavirus while multiple highly-safe & highly-effective vaccines sat on the shelf unused, simply because some bioethicist screwed up a p-value in a paper somewhere. If only!)

The actual day-to-day churn of publishing bioethics papers/research... Well, HHGttG said it best in describing humans in general:

Mostly Harmless.

Comment by gwern on The EMH is False - Specific Strong Evidence · 2021-03-23T14:55:07.292Z · LW · GW

I haven't heard that claim before. My understanding was that such a claim would be improbable or cherrypicking of some sort, as a priori risk-adjusted etc returns should be similar or identical but by deliberately narrowing your index, you do predictably lose the benefits of diversification. So all else equal (such as fees and accessibility of making the investment), you want the broadest possible index.

Comment by gwern on The EMH is False - Specific Strong Evidence · 2021-03-18T23:39:54.444Z · LW · GW

Since we're discussing EMH and VTSAX, seems as good a place to add a recent anecdote:

Chatting with someone, investments came up and they asked me where I put mine. I said 100% VTSAX. Why? Because I think the EMH is as true as it needs to be, I don't understand why markets rise and fall when they do even when I think I'm predicting future events accurately (such as, say, coronavirus), and I don't think I can beat the stock markets, at least not without investing far more effort than I care to. They said they thought it wasn't that hard, and had (unlike me) sold all their stocks back in Feb 2020 or so when most everyone was still severely underestimating coronavirus, and beat the market drops. Very impressive, I said, but when had they bought back in? Oh, they hadn't yet. But... didn't that mean they missed out on the +20% net returns or so of 2020, and had to pay taxes? (VTSAX returned 21% for 2020, and 9.5% thus far for 2021.) Yes, they had missed out. Oops.

Trading is hard.

Comment by gwern on What's a good way to test basic machine learning code? · 2021-03-18T00:28:16.286Z · LW · GW

ALE is doubtless the Atari Learning Environment. I've never seen an 'ALE' in DRL discussions which refers to something else.

Comment by gwern on [AN #142]: The quest to understand a network well enough to reimplement it by hand · 2021-03-17T17:38:46.276Z · LW · GW

It is quite possible that CLIP “knows” that the image contains a Granny Smith apple with a piece of paper saying “iPod”, but when asked to complete the caption with a single class from the ImageNet classes, it ends up choosing “iPod” instead of “Granny Smith”. I’d caution against saying things like “CLIP thinks it is looking at an iPod”; this seems like too strong a claim given the evidence that we have right now.

Yes, it's already been solved. These are 'attacks' only in the most generous interpretation possible (since it does know the difference), and the fact that CLIP can read text in images to, arguably, correctly note the semantic similarity in embeddings, is to its considerable credit. As the CLIP authors note, some queries benefit from ensembling, more context than a single word class name such as prefixing "A photograph of a ", and class names can be highly ambiguous: in ImageNet, the class name "crane" could refer to the bird or construction equipment; and the Oxford-IIIT Pet dataset labels one class "boxer".

Comment by gwern on Kenshō · 2021-03-17T00:19:45.858Z · LW · GW

Harper's has a new article on meditation which delves into some of these issues. It doesn't mention PNSE or Martin by name, but some of the mentioned results parallel them, at least:

...Compared with an eight-person control group, the subjects who meditated for more than thirty minutes per day experienced shallower sleep and woke up more often during the night. The more participants reported meditating, the worse their sleep became... A 2014 study from Carnegie Mellon University subjected two groups of participants to an interview with openly hostile evaluators. One group had been coached in meditation for three days beforehand and the other group had not. Participants who had meditated reported feeling less stress immediately after the interview, but their levels of cortisol—the fight-or-flight hormone—were significantly higher than those of the control group. They had become more sensitive, not less, to stressful stimuli, but believing and expecting that meditation reduced stress, they gave self-reports that contradicted the data.

Britton and her team began visiting retreats, talking to the people who ran them, and asking about the difficulties they’d seen. “Every meditation center we went to had at least a dozen horror stories,” she said. Psychotic breaks and cognitive impairments were common; they were often temporary but sometimes lasted years. “Practicing letting go of concepts,” one meditator told Britton, “was sabotaging my mind’s ability to lay down new memories and reinforce old memories of simple things, like what words mean, what colors mean.” Meditators also reported diminished emotions, both negative and positive. “I had two young children,” another meditator said. “I couldn’t feel anything about them. I went through all the routines, you know: the bedtime routine, getting them ready and kissing them and all of that stuff, but there was no emotional connection. It was like I was dead.”

...Britton’s research was bolstered last August when the journal Acta Psychiatrica Scandinavica published a systematic review of adverse events in meditation practices and meditation-based therapies. Sixty-five percent of the studies included in the review found adverse effects, the most common of which were anxiety, depression, and cognitive impairment. “We found that the occurrence of adverse effects during or after meditation is not uncommon,” the authors concluded, “and may occur in individuals with no previous history of mental health problems.” I asked Britton what she hoped people would take away from these findings. “Comprehensive safety training should be part of all meditation teacher trainings,” she said. “If you’re going to go out there and teach this and make money off it, you better take responsibility. I shouldn’t be taking care of your casualties.”

Comment by gwern on Resolutions to the Challenge of Resolving Forecasts · 2021-03-17T00:07:23.517Z · LW · GW

Why close the markets, though?

Comment by gwern on Resolutions to the Challenge of Resolving Forecasts · 2021-03-16T16:52:02.921Z · LW · GW

In such cases, perhaps the rules would be to pick a probability based on the resolution of past games - with the teams tied, it resolves at 50%, and with one team up by 3 runs in the 7th inning, it resolves at whatever percentage of games where a team is up by 3 runs at that point in the game wins.

Sounds like Pascal's problem of the points, where the solution is to provide the expected value of winnings, and not merely allocate all winnings to which player has the highest probability of victory. Suppose 1 team has 51% probability of winning - should the traders who bought that always get a 100% payoff and the 49% shares be worthless? That sounds extremely distortionary if it happens at all frequently.

Plus quite hard to estimate: if you had a model more accurate than the prediction market, it's not clear why you would be using the PM in the first place. On the other hand, there is a source of the expected value of each share which incorporates all available information and is indeed close at hand: the share prices themselves. Seems much fairer to simply liquidate the market and assign everyone the last traded value of their share.

Comment by gwern on February 2021 newsletter · 2021-03-13T15:23:57.959Z · LW · GW

No; I've only seen the first season of AoT, if there are armored trains in the rest I am unaware of that. It's actually from someone on either DSL or Naval Gazing, I think, linking to a short history of Zaamurets which is patchy but interesting in its own right.

Comment by gwern on The average North Korean mathematician · 2021-03-11T03:07:55.566Z · LW · GW

To noodle a bit more about tails coming apart: asymptotically, no matter how large r, the probability of a 'double max' (a country being the top/max on variable A correlated r with variable B also being top/max on B) decreases to 1/n. The decay is actually quite rapid, even with small samples you need r>0.9 to get anywhere.

A concrete example here: you can't get 100%, but let's say we only want a 50% chance of a double-max. And we're considering just a small sample like 192 (roughly the number of countries in the world, depending on how you count). What sort of r do we need? We turn out to need r ~ 0.93! There are not many correlations like that in the social sciences (not even when you are taking multiple measurements of the same construct).

Some R code to Monte Carlo estimates of the necessary r for n = 1-193 & top-p = 50%:

p_max_bivariate_montecarlo <- function(n,r,iters=60000) {
    library(MASS) # for 'mvrnorm'
    mean(replicate(iters, {
                sample <- mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, r, r, 1), nrow=2))
                })) }

find_r_for_topp <- function(n, topp_target=0.5) {
  r_solver <- function(r) {
     topp <- p_max_bivariate_montecarlo(n, r)
     return(abs(topp_target - topp))
  optim(0.925, r_solver, method="Brent", lower=0, upper=1)$par
library(parallel); library(plyr) # parallelism
rs <- ldply(mclapply(2:193, find_r_for_topp))$V1
# c(0.0204794413, 0.4175067131, 0.5690806174, 0.6098019663, 0.6994770020, 0.7302042200, 0.7517989571, 0.7652371794, 0.7824824776, 0.7928299227, 0.7911903664, 0.8068905240, 0.8177673342, 0.8260679686, 0.8301939461, 0.8258472869, 0.8314810573, 0.8457114147, 0.8477265340, 0.8599239760, 0.8541010795, 0.8539345369, 0.8578597015, 0.8581440013, 0.8584451493, 0.8612079626, 0.8640382310, 0.8693895810, 0.8681881832, 0.8540880634, 0.8688769562, 0.8734774025, 0.8762371597, 0.8737293740, 0.8791205385, 0.8798232797, 0.8780001174, 0.8813006544, 0.8795942424, 0.8809224752, 0.8789012597, 0.8826072026, 0.8820106235, 0.8833360963, 0.8880434608, 0.8865534542, 0.8860206658, 0.8909726985, 0.8918581133, 0.8896068426, 0.8931125753, 0.8915826504, 0.8881211032, 0.8860882133, 0.8857084275, 0.8962766690, 0.8921903730, 0.8942188090, 0.8969666799, 0.8926138586, 0.8971690171, 0.8946804108, 0.8973194094, 0.8942509678, 0.8999695035, 0.8965944860, 0.8961380935, 0.8940129777, 0.9032449177, 0.9008863181, 0.9032217868, 0.9005629127, 0.9020274591, 0.8959058553, 0.9021526115, 0.9039115040, 0.9011588080, 0.9035249155, 0.9017018519, 0.9055311291, 0.9050712304, 0.9090986369, 0.9102189075, 0.9058648333, 0.9062347968, 0.9036232208, 0.9098300563, 0.9104166481, 0.9082378601, 0.9097509415, 0.9072401723, 0.9110669707, 0.9097015650, 0.9095392911, 0.9104547321, 0.9109965730, 0.9105344751, 0.9113974777, 0.9098016391, 0.9108745395, 0.9096074058, 0.9101558716, 0.9114150600, 0.9098197705, 0.9140866653, 0.9110598057, 0.9098305291, 0.9126945140, 0.9116794250, 0.9098304525, 0.9162597410, 0.9138880049, 0.9166744242, 0.9115174937, 0.9098300563, 0.9137427958, 0.9154025570, 0.9098300563, 0.9153743094, 0.9121638454, 0.9098300563, 0.9124202538, 0.9150891460, 0.9155692284, 0.9154097048, 0.9148239514, 0.9135391377, 0.9134265701, 0.9184868581, 0.9155030511, 0.9160840080, 0.9156142020, 0.9180363741, 0.9133847724, 0.9178412895, 0.9164848154, 0.9185051043, 0.9186443572, 0.9163631983, 0.9067252079, 0.9171935358, 0.9068669658, 0.9172083988, 0.9216221015, 0.9173032657, 0.9161656322, 0.9193769687, 0.9196134184, 0.9189703040, 0.9168335043, 0.9208238293, 0.9176496818, 0.9177692888, 0.9193447026, 0.9083813817, 0.9171593478, 0.9207227165, 0.9215861226, 0.9094130225, 0.9197835707, 0.9175185705, 0.9207226893, 0.9213173454, 0.9211625233, 0.9187349438, 0.9094856342, 0.9218536229, 0.9213765908, 0.9216097564, 0.9215764567, 0.9098389885, 0.9098265564, 0.9217230988, 0.9219802481, 0.9226050491, 0.9174997507, 0.9098423672, 0.9208316851, 0.9219666398, 0.9213117029, 0.9227359249, 0.9107645063, 0.9217438628, 0.9225905693, 0.9220370631, 0.9259721234, 0.9225535447, 0.9249239773, 0.9256348191, 0.9232228035, 0.9101015711, 0.9253350470)
qplot(1:192, rs) + coord_cartesian(ylim=c(0,1)) + ylab("Necessary bivariate correlation (Pearson's r)") + xlab("Population size") + ggtitle("Necessary correlation strength for ~50% chance of double-max on 2 correlated variables (Monte Carlo)") + theme_linedraw(base_size = 24) + geom_point(size=4)

Comment by gwern on The average North Korean mathematician · 2021-03-10T19:17:57.759Z · LW · GW

The tails coming apart is "Nigeria has the best Scrabble players in the world, but the persons with the richest English vocabulary in the world are probably not Nigerian"

No. The tails coming apart here would be "gameplaying of game A correlates with national variable B but the top players of game A are not from the top country on variable B".

I say it's borderline circular because while they aren't the same explanation, they can be made trivially the same depending on how you shuffle your definitions to save the appearances. For example, consider the hypothesis that NK has exactly the same distribution of math talent as every other country of similar GDP, the same mean/SD/etc, but they have a more intense selection process recruiting IMO participants. This is entirely consistent with tails coming apart ("yes, there is a correlation between GDP and IMO, but it's r<1 so we are not surprised to see residuals and overperformance which happens to be NK in this case, which is due to difference in selection process"), but not with the distributional hypothesis - unless we post hoc modify the distribution hypothesis, "oh, I wasn't talking about math talent distributions per se, ha ha, you misunderstood me, I just meant, IMO participant distribution; who cares where that distribution difference comes from, the important thing is that the NK IMO participant distribution is different from the other countries' IMO participant distributions, and so actually this only proves me right all along!"

Comment by gwern on The average North Korean mathematician · 2021-03-10T19:11:35.122Z · LW · GW

There are many countries besides Nigeria where English is an official language, elite language, or widely taught. And language proficiency apparently has little to do with Scrabble success at pro levels where success depends on memorizing an obsolete dictionary's words (apparently even including not really-real words, to the point where I believe someone won the French Scrabble world championship or something without knowing any French beyond the memorized dictionary words).

Comment by gwern on Above the Narrative · 2021-03-10T01:00:40.876Z · LW · GW

I assume you're referring to the 'vault' thing WP mentions there as "Recently credited by Alan Sherman"? Then no, Chaum is irrelevant to Satoshi except inasmuch as his Digicash was a negative example to the cryptopunks about the vulnerability of trusted third parties & centralization to government interference & micromanagers (some of whom, like Szabo, worked for him). The vault thing didn't inspire Satoshi because it inspired no one; if it had, it wouldn't need any Alan Sherman to dig it up in 2018. You will not find it cited in the Bitcoin whitepaper, it was never mentioned in any of the early mailing list discussions or private emails, it is not in any of Szabo's essays, it's not in the Cyphernomicon, etc etc. Nor could anyone have easily gotten it as it wasn't published and wasn't available online then or apparently until quite recently (given that the IA has no mirrors of the copy on Chaum's website - I've added a direct link in the WP article so hopefully availability will improve). In fact, this is the very first time I've so much as heard of it. If Satoshi 'got most of his ideas from the academy', it was definitely a different part of the academy... Chaum was irrelevant*.

Claiming Chaum's vault directly inspired Satoshi is just the typical academic colonizing practice of post hoc ergo propter hoc in fabricating an intellectual pedigree for a working system (Schmidhuber being the most infamous practitioner of this particular niche); it is not true, as a matter of causality or history. (And to their credit, they admit that, like most unpublished theses which are promptly buried in the university library never to be read again, it went "largely unnoticed", which is rather an understatement; looking at the citations of it in GS, they are all secret-sharing related, ignoring any proto-blockchain aspect, and skimming a few, I doubt any of the citers actually read it, which is pretty typical especially for hard-to-get theses.)

* Actually, I'd say Chaum's ideas were a huge obstacle for Satoshi. My read of the e-cash literature is that he was a deeply negative influence in creating a mathematically-seductive dead end that academics could, and did, mine for decades, coming up with countless subtle variants. But no amount of moon math turns Chaumian blinded credentials into Bitcoin. Satoshi's success could only have come from ignoring the entire literature springing from Chaum and coming up with a fundamentally different approach. Given the profoundly negative reaction to Bitcoin even among non-academics not sworn to Chaumian approaches, I am unable to imagine Bitcoin ever arising in American academia. That's just a radically ahistorical reading which requires assuming that anything which can be remotely associated with academics must be solely causally due to them.

Comment by gwern on The average North Korean mathematician · 2021-03-07T23:48:58.559Z · LW · GW

While the greater male variance hypothesis, and tail effects in general, are always interesting, I'm not sure if it's too illuminating here. It is not surprising that there are some weird outliers at the top of the IMO list, 'weird' in the sense of 'outperforming' what you'd expect given some relevant variable like GDP, intellectual freedom, HDI index, national IQ, or whatever. That's simply what it means for the correlation between IMO scores & that variable to be <1. If the IMO list was an exact rank-order correspondence, then the correlation would =1; but no one would have predicted that, because we know in the real world all such correlations are <1, and that means that some entries must be higher than expected in the list (and some lower). There's always a residual. (This is part of why tests and measured can be gamed, because the latent variable, which is what we're really interested in, is not absolutely identical in every way to the measure itself, and every difference is a gap into which optimizing agents can jam a wedge.)

When North Korea places high despite being a impoverished totalitarian dictatorship routinely struggling with malnutrition and famine, it's just the tails coming apart. If we are curious, we can look for an additional variable to try to explain that residual.

For example, on a lot of economic indexes like GDP, Saudi Arabia places high, despite being a wretched place in many respects; does that mean that whipping women for going out in public is good for economic growth? No, it just means that having the blind idiot luck to be floating on a sea of unearned oil lets you be rich despite your medieval policies and corruption. (Although, as Venezuela demonstrates, even a sea of oil may not be enough if your policies are bad enough.) SA does badly on many variables other than GDP which cannot be so easily juiced with oil revenue by the state. Similarly, at the Olympics, Warsaw Pact countries infamously won many gold medals & set records. Does that mean the populations were extremely healthy and well-fed and happy? No, illegal doping and hormone abuse and coercion and professionalized state athletics aimed solely at Olympic success probably had something to do with that. Their overperformance disappeared, and they didn't show such overperformance in anything else you might expect to be related to athletics, like non-Olympic sports, popular pro sports/entertainment, or life expectancy. Or, as respectable as Russian chess players were beforehand, the Russian school of chess, particularly in the Cold War, could never have prospered the way it did without extensive state support (potentially literally, given the accusations of employing espionage techniques and other cheating*), as a heavily-subsidized, propagandized domestically & overseas, professionalized program with lifetime employment, major perks like overseas travel, safety from persecution due to politically-connected patrons, and the sheer lack of much better opportunities elsewhere. But many other areas suffered, and like so many things in the USSR (like the Moscow subway?), the chess served as a kind of Potemkin village. More recently, Nigeria boasts an unusual amount of Scrabble champions; is Nigeria actually bursting with unrealized potential? Probably not, because they don't dominate any other competitive game such as chess or checkers or poker, or intellectual pursuits in general, and Nigerian Scrabble seems to be path-dependence leading to specialization; you can easily win the annual per capita income of Nigeria at Scrabble tournaments, and there is now a self-sustaining Scrabble community telling you it's a doable career and providing an entryway. Weird, but there's a lot of games and countries out there, and one is always stumbling across strange niches, occupations, and the like which emphasize the role of chance in life.

* see Oscar's comment about NK IMO cheating, which I didn't know about, but am entirely unsurprised by.

North Korea's IMO overperformance looks like it's about the same thing as Soviet chess or Warsaw Pact athletics in general. I don't know what benefits they get (do their families get to change castes, and move to Pyongyang? immunity from prison camps? how useful is the overseas travel to them? is it a feeder into the bubble of the nuclear program? how much financial support and specialized study and tutors do they get?), but I would bet a lot that the relative benefits for a NK kid who wins at the IMO are vastly larger than for a soft suburban kid from a US magnet high school who has never attended a public execution or gone hungry, and at most gets another resume item for college. (I've seen more than one IMO competitor note that IMO is not really reflective of 'real' math, but is its own sort of involuted discipline; always a risk in competitions, and seems to have afflicted the much-criticized Cambridge Old Tripos.) This is what juices the residual: almost all countries exert merely an ordinary endogenous sort of IMO effort, and only a few see it as one of the priorities to invest a maximum effort into. NK, it turns out, sees it as a priority, like building statues, I guess. The only remaining question here about the NK IMO residual is the historical contingency: how did NK happen to make IMO one of its 'things'? Is it merely its typical envy-hatred towards China, because China for its own reasons targeted the IMO?

You can shoehorn this into a distributional argument, but when you don't know which of the moments is changing (mean? SD? skew?), or even what the distribution might be (filtering or selecting from a normal does not yield a normal), I don't find it too helpful and borderline circular. ("Why is NK performance on IMO high? Because their IMO performance distribution has a higher mean. How do we know that? Because their IMO performance is high.") Pointing at the imperfect bivariate correlation and analyzing the possible causes of a residual is much more informative. When you look at the state involvement in IMO, it explains away any apparent contradiction with what you believed about correlations between intellectual achievement and GDP or whatever.

Comment by gwern on Fun with +12 OOMs of Compute · 2021-03-07T00:36:16.631Z · LW · GW

One man's a priori is another man's a posteriori, one might say; there are many places one can acquire informative priors... Learning 'tacit knowledge' can be so fast as to look instantaneous. An example here would be OA's Dactyl hand: it learns robotic hand manipulation in silico, using merely a model simulating physics, with a lot of randomization of settings to teach it to adapt on the fly, to whatever new model it finds itself in. This enables it to, without ever once training on an actual robot hand (only simulated ones), successfully run on an actual robot hand after seconds of adaptation. Another example might be PILCO: it can learn your standard "Cartpole" task within just a few trials by carefully building a Bayesian model and picking maximally informative experiments to run. (Cartpole is quite difficult for a human, incidentally, there's an installation of one in the SF Exploratorium, and I just had to try it out once I recognized it. My sample-efficiency was not better than PILCO.) Because the Phites have all that computation and observations of the real world, they too can do similar tricks, and who knows what else we haven't thought of.

Comment by gwern on Takeaways from one year of lockdown · 2021-03-01T22:44:16.858Z · LW · GW

I was recently tracking down a reference in the Sequences and found that the author was so afraid of COVID that he failed to seek medical care for appendicitis and died of sepsis.

Wow! Who was that?

and the faint but pretty smell of vanilla.

I think you mean "...and a presumption that once our eyes watered." (As time passes, this is increasingly how I feel about my grandmother dying of coronavirus.)

Comment by gwern on Mentorship, Management, and Mysterious Old Wizards · 2021-02-26T01:03:46.742Z · LW · GW

Michael Nielsen calls something similar "volitional philanthropy", with some examples.

Comment by gwern on What happens to variance as neural network training is scaled? What does it imply about "lottery tickets"? · 2021-02-24T03:08:08.390Z · LW · GW

'Variance' is used in an amusing number of ways in these discussions.You use 'variance' in one sense (the bias-variance tradeoff), but "Explaining Neural Scaling Laws", Bahri et al 2021 talks about a difference kind of variance limit in scaling, while "Learning Curve Theory", Hutter 2001's toy model provides statements on yet others kinds of variances about scaling curves themselves (and I think you could easily dig up a paper from the neural tangent kernel people about scaling approximating infinite width models which only need to make infinitesimally small linear updates or something like that because variance in a different sense goes down...) Meanwhile, my original observation was about the difficulty of connecting benchmarks to practical real-world capabilities: regardless of whether the 'variance of increases in practical real-world capabilities' goes up or down with additional scaling, we still have no good way to say that an X% increase on benchmarks ought to yield qualitatively new capability Y - almost a year later, still no one has shown how you would have predicted in advance that pushing GPT-3 to a particular likelihood loss would yield all these cool new things. As we cannot predict that at all, it would not be of terribly much use to say whether it either increases or decreases as we continue scaling (since either way, we may wind up being surprised).

Comment by gwern on Meetup Notes: Ole Peters on ergodicity · 2021-02-23T18:46:14.712Z · LW · GW

So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters' general idea, but isn't just log-wealth maximization?

Yes. As I've pointed out before, a lot of these problems go away if you simply solve the actual problem instead of a pseudo-problem. Decision theory, and Bayesian decision theory, has no problem with multi-step processes, like POMDPs/MDPs - or at least, I have yet to see anyone explain what, if anything, of Peters/Taleb's 'criticisms' of expected-value goes away if you actually solve the corresponding MDP. (Bellman did it better 70 years ago.)