## Posts

Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal 2020-01-08T22:20:20.290Z · score: 58 (13 votes)
"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z · score: 14 (4 votes)
August 2019 gwern.net newsletter (popups.js demo) 2019-09-01T17:52:01.011Z · score: 12 (4 votes)
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z · score: 29 (9 votes)
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z · score: 49 (12 votes)
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z · score: 27 (7 votes)
On Having Enough Socks 2019-06-13T15:15:21.946Z · score: 21 (6 votes)
"One Man's Modus Ponens Is Another Man's Modus Tollens" 2019-05-17T22:03:59.458Z · score: 34 (5 votes)
"Everything is Correlated": An Anthology of the Psychology Debate 2019-04-27T13:48:05.240Z · score: 49 (7 votes)
'This Waifu Does Not Exist': 100,000 StyleGAN & GPT-2 samples 2019-03-01T04:29:16.529Z · score: 39 (12 votes)
"Forecasting Transformative AI: An Expert Survey", Gruetzemacher et al 2019 2019-01-27T02:34:57.214Z · score: 17 (8 votes)
"AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] 2019-01-24T20:49:01.350Z · score: 62 (23 votes)
Visualizing the power of multiple step selection processes in JS: Galton's bean machine 2019-01-12T17:58:34.584Z · score: 27 (8 votes)
Littlewood's Law and the Global Media 2019-01-12T17:46:09.753Z · score: 37 (8 votes)
Evolution as Backstop for Reinforcement Learning: multi-level paradigms 2019-01-12T17:45:35.485Z · score: 18 (4 votes)
Whole Brain Emulation & DL: imitation learning for faster AGI? 2018-10-22T15:07:54.585Z · score: 15 (5 votes)
Genomic Prediction is now offering embryo selection 2018-10-07T21:27:54.071Z · score: 39 (14 votes)
5m cryptocurrency donation to Alcor by Brad Armstrong in memory of LWer Hal Finney 2018-05-17T20:31:07.942Z · score: 48 (12 votes) Tech economics pattern: "Commoditize Your Complement" 2018-05-10T18:54:42.191Z · score: 98 (28 votes) April links 2018-05-10T18:53:48.970Z · score: 20 (6 votes) March gwern.net link roundup 2018-04-20T19:09:29.785Z · score: 27 (6 votes) Recent updates to gwern.net (2016-2017) 2017-10-20T02:11:07.808Z · score: 7 (7 votes) The NN/tank Story Probably Never Happened 2017-10-20T01:41:06.291Z · score: 2 (2 votes) Regulatory lags for New Technology [2013 notes] 2017-05-31T01:27:52.046Z · score: 5 (5 votes) "AIXIjs: A Software Demo for General Reinforcement Learning", Aslanides 2017 2017-05-29T21:09:53.566Z · score: 4 (4 votes) ## Comments Comment by gwern on Are veterans more self-disciplined than non-veterans? · 2020-03-23T14:11:36.296Z · score: 29 (10 votes) · LW · GW Draft lotteries provide randomized experiments. The most famous one is the Vietnam draft analysis by Angrist 1990 where he finds a 15% penalty to lifetime income caused by being drafted. There's also little evidence of any benefit in Low-Aptitude Men In The Military: Who Profits, Who Pays?, Laurence & Ramberger 1991, which covers Project 100,000 among others, which drafted individuals who would not otherwise have been drafted either as a matter of policy or, in the ASVAB Misnorming, unrealized accident. (As Fredrik deBoer likes to say, if there's any way it can possibly be a selection effect, then yeah, it's a selection effect.) Comment by gwern on SARS-CoV-2 pool-testing algorithm puzzle · 2020-03-22T14:33:57.103Z · score: 6 (3 votes) · LW · GW It describes a lot of things, and I'm not sure which ones would be the right answer here. If anyone wants to read through and describe which one seems to best fit the corona problem, that'd be a better answer than just providing an enormous difficult-to-read monograph. Comment by gwern on Evaluability (And Cheap Holiday Shopping) · 2020-03-22T14:31:22.459Z · score: 2 (1 votes) · LW · GW No, one shouldn't. Playing a game of chance once or a thousand times does not influence the outcome of the next round (aka the gambler's fallacy). If a bet is a good idea one time, it's a good idea a thousand times. And if a bet is a good idea a thousand times, it is a good idea the first time. How could you consider betting a thousand times to be a good idea, if you think each individual bet is a bad idea? Gambler's ruin. The bets are the same, but you are not. If bets are rare enough or large enough, they do not justify they simplifying assumption of ergodicity ('adding up to normality' in this case) and greedy expected-value maximization is not optimal. Comment by gwern on What would be the consequences of commoditizing AI? · 2020-03-22T00:50:44.703Z · score: 5 (2 votes) · LW · GW I'm not clear here what you mean by "AI". Full AGI, or just AI like the present (various specialized systems, usually but not often subhuman capabilities)? If the former, it's hard to speculate. The latter, however, is already being commoditized and I include it as an example. FANG practically compete to see who can give away more source code & datasets & research and open-source everything, to the extent that when someone announces a release of a new reinforcement learning framework library I don't even bother announcing it on /r/reinforcementlearning unless there's something special. FANG is not threatened by open-sourcing everything because all the benefits come from the integration into their ecosystem as part of your smartphone, running on the ASICs built into your smartphone, with access to all the data in your account and the private data in all the FANG datacenters, which power your upfront payments and then advertising eyeballs later. Someone who invents a better image classifier can not in any meaningful way threaten Google's Android browser advertising revenues, but does help make Android image apps somewhat better and increases Google revenue indirectly, and so on. Thus, Google can release not just new research but the actual models themselves like EfficientNet, and it's no problem. You can provide a free competitor. But Google provides something which is "better than free". As far as 'tool AI' or 'AI services' go, this seems to be the overall theme. Most tasks themselves are not too valuable. The question is what can you build on top or around it, and what does it unlock? (https://www.overcomingbias.com/2019/12/automation-as-colonization-wave.html is interesting.) Comment by gwern on SARS-CoV-2 pool-testing algorithm puzzle · 2020-03-20T17:13:51.811Z · score: 26 (11 votes) · LW · GW If you want to get really hardcore in reading up on pool/group testing, there's a recent monograph: "Group Testing: An Information Theory Perspective: chapter 3: Algorithms for Noisy Group Testing", Aldridge et al 2019. Comment by gwern on What information, apart from the connectome, is necessary to simulate a brain? · 2020-03-20T15:34:17.835Z · score: 5 (2 votes) · LW · GW One question for WBE is whether we really need to upload a specific brain or if it is enough to use a generic brain as a template and a kind of informative prior to greatly speed up alternative AI techniques (an emerging paradigm I'm calling "brain imitation learning" for now). Comment by gwern on Open & Welcome Thread - March 2020 · 2020-03-19T15:24:58.805Z · score: 7 (4 votes) · LW · GW https://magazine.atavist.com/promethea-unbound-child-genius-montana comes to mind as a cautionary example. Comment by gwern on The correct response to uncertainty is *not* half-speed · 2020-03-19T01:00:06.698Z · score: 9 (5 votes) · LW · GW Happens all the time in decision theory & reinforcement learning: the average of many good plans is often a bad plan, and a bad plan followed to the end is often both more rewarding & informative than switching at every timestep between many good plans. Any kind of multi-modality or need for extended plans (eg due to upfront costs/investments) will do it, and exploration is quite difficult - just taking the argmax or adding some randomness to action choices is not nearly enough, you need "deep exploration" (as Osband likes to call it) to follow a specific hypothesis to its limit. This is why you have things like 'posterior sampling' (generalization of Thompson sampling), where you randomly pick from your posterior of world-states and then follow the optimal strategy assuming that particular world state. (I cover this a bit in two of my recent essays, on startups & socks.) Comment by gwern on [AN #91]: Concepts, implementations, problems, and a benchmark for impact measurement · 2020-03-18T20:06:10.972Z · score: 4 (2 votes) · LW · GW On the other hand, OpenAI Five (AN #13) also has many, many subtasks, that in theory should interfere with each other, and it still seems to train well. True, but OA5 is inherently a different setup than ALE. Catastrophic forgetting is at least partially offset by the play against historical checkpoints, which doesn't have an equivalent in your standard ALE; the replay buffer typically turns over so old experiences disappear, and there's no adversarial dynamics or AlphaStar-style population of agents which can exploit forgotten area of state-space. Since Rainbow is an off-policy DQN, I think you could try saving old checkpoints and periodically spending a few episodes running old checkpoints and adding the experience samples to the replay buffer, but that might not be enough. There's also the batch size. The OA5 batch size was ridiculously large. Given all of the stochasticity in a DoTA2 game & additional exploration, that covers an awful lot of possible trajectories. In Gmail, everything after They also present some fine-grained experiments which show that for a typical agent, training on specific contexts adversely affects performance on other contexts that are qualitatively different. Is cut off by default due to length. Comment by gwern on Welcome to LessWrong! · 2020-03-18T00:51:44.596Z · score: 2 (1 votes) · LW · GW If you're interested in LW2's typography, you should take a look at GreaterWrong, which offers a different and much more old-school non-JS take on LW2, with a number of features like customizable CSS themes. There is a second project, Read The Sequences.com, which focuses on a pure non-interactive typography-heavy presentation of a set of highly-influential LW1 posts. Finally, there's been cross-pollination between LW2/GW/RTS and my own website (description of design). Comment by gwern on A practical out-of-the-box solution to slow down COVID-19: Turn up the heat · 2020-03-15T17:16:36.183Z · score: 9 (6 votes) · LW · GW Malaysia is up to 428 cases now and rising rapidly: https://www.bloomberg.com/news/articles/2020-03-15/malaysia-virus-cases-spike-after-outbreak-at-16-000-strong-event They've been averaging 24C with peaks of 36C. Not looking good for the heat hypothesis. Comment by gwern on Why the tails come apart · 2020-03-12T15:49:20.147Z · score: 2 (1 votes) · LW · GW Comment by gwern on Assessing Kurzweil: the results · 2020-03-04T02:33:27.053Z · score: 24 (10 votes) · LW · GW As another decade has passed, his 2019 from 1999 predictions could be graded too: https://en.wikipedia.org/wiki/Predictions_made_by_Ray_Kurzweil#2019 Skimming them, there are a lot of total misses (eg computer chips being carbon nanotubes - not even on the horizon as of 2020, silicon is king), but there's also a fair number of ones which came true only just recently. For example, a lot of the software ones, like speech recognition/synthesis/transcription and being able to wear glasses with real-time captioning and using machine translation apps in ordinary conversation, really only came true within the last few years, and I don't think this would have been predicted in 1999 based purely on trend extrapolation from n-grams machine translation or HMM voice recognition. Likewise, even allowing for "genetic algorithms" being wrong, "Massively parallel neural nets and genetic algorithms are in wide use." would have been dismissed. I thought while grading his 2010 predictions back then that Kurzweil's errors seemed to be to overestimate how much hardware and biology would be to change, but his software-related projections generally much better, and that seems even truer of his 2019 predictions. Comment by gwern on "What Progress?" Gwern's 10 year retrospective · 2020-03-02T17:40:04.392Z · score: 17 (8 votes) · LW · GW It's still more of a draft than finished writeup. I'll be sending out the newsletter when it's officially done, as always. Comment by gwern on Value of the Long Tail · 2020-02-26T18:03:20.375Z · score: 7 (3 votes) · LW · GW Comment by gwern on (a) · 2020-02-20T23:15:28.911Z · score: 10 (3 votes) · LW · GW An open question for me is whether it makes sense to not pre-emptively archive everything. Update: I ultimately decided to give this a try, using SingleFile to capture static snapshots. Detailed discussion: https://www.reddit.com/r/gwern/comments/f5y8xl/new_gwernnet_feature_local_link_archivesmirrors/ It currently costs ~5300 links/20GB, which is not too bad but may motivate me to find new hosting as I think it will substantially increase my monthly S3 bandwidth bill. The snapshots themselves look pretty good and no one has reported yet serious problems.... Too early to say, but I'm happy to be finally giving it a try. Comment by gwern on The Personality of (great/creative) Scientists: Open and Conscientious · 2020-02-20T00:12:18.150Z · score: 5 (2 votes) · LW · GW "Scientists are curious and passionate and ready to argue": This psychological assessment owes nothing to surveys or personality testing; it pays no heed to the zodiac. Instead, researchers took the linguistic data from 200 tweets each of nearly 130,000 Twitter users across more than 3,500 occupations to assess their “personality digital fingerprints”. They used machine learning to identify the traits and values that distinguish professions from each other. This “21st century approach for matching one’s personality with congruent occupations,” dubbed the robot career adviser, is more reliable than existing career guidance methods based on self-reports through questionnaires, the researchers argue in a January paper in Proceedings of the National Academy of Sciences. Personality characteristics of Twitter users were inferred using IBM Watson's Personality Insights tool. The study focused on five specific traits (extraversion, agreeableness, conscientiousness, emotional stability, and openness) and five values (helping others, tradition, taking pleasure in life, achieving success, and excitement). The analysis revealed that scientists combine low agreeableness and low conscientiousness with high openness. “The combination is characteristic of people who tend to be unconventional and quirky, consistent with the image of scientists as curious and even sometimes eccentric boffins,” says one of the authors, Paul McCarthy+, an adjunct professor at the University of New South Wales (UNSW Sydney) in Australia. “In some ways it does confirm stereotypes.” Within the sciences, a spectrum of personality traits emerged. Those dealing with more abstract or inanimate things (mathematicians, geologists) were more open than those in the life sciences (bio-statisticians, horticulturalists), who “tended to be more extroverted and agreeable”, says McCarthy. Scientists and software programmers, whose personality characteristics aligned closely, were generally more open to experiencing a variety of new activities, tended to think in symbols and abstractions, and found repetition boring, the researchers found. On the spectrum of occupations, scientists are especially different from professional tennis players. “Tennis professionals are a lot more agreeable and conscientious than all others in the study — especially scientists,” says McCarthy. “It makes sense, because to be a tennis player, you have to be highly conscientious and be willing to take direction, whereas scientists are almost the complete opposite. They don’t take direction, their openness to experiences is very high, but so is their openness to being disagreeable.” He noted that the occupation of research director had the highest median openness scores of any of the 3,513 occupations in the study. Comment by gwern on [deleted post] 2020-02-18T14:24:08.586Z My understanding was that distilling CNNs worked more-or-less by removing redundant weights, rather than by discovering a more efficient form of representing the data. No. That might describe sparsification, but it doesn't describe distillation, and in either case, it's shameless goalpost moving - by handwaving away all the counterexamples, you're simply no-true-Scotsmanning progress. 'Oh, Transformers? They aren't real performance improvements because they just learn "good representations of the data". Oh, model sparsification and compression and distillation? They aren't real compression because they're just getting rid of "wasted information".' Comment by gwern on [deleted post] 2020-02-18T01:40:06.057Z Of course, most of us would be very skeptical. Not just because insights of that magnitude are rarely ever discovered by a single person or small team of people, but also because it's hard to see how there could be a simple core to image classification. The reason why you can recognize a cat is not because cats are simple things in thingspace and are therefore easily identifiable; it's because there are a bunch of things that make cats cat-like, and you understand a lot about the world. Current image classifiers recognize cats because they have learned a bunch of features: whiskers, ears, legs, fur, eyes, tails etc. and they leverage this learned knowledge to identify cats. Humans recognize cats because they have learned a bunch of information about animals, bodies, moving objects, and some domain specific information about cats, and they leverage this learned knowledge to identify cats. Either way, there's no way around the fact that you need to know a lot in order to understand what is and isn't a cat. Image classification just isn't the type of thing that should be easily compressible, because by compressing it, you lose important learned information that can be used to identify features of the world. In fact, I think we can say the same about many areas of intelligence. According to you, the entire field of model distillation & compression, whose paradigmatic use-case is compressing image classification CNNs down to sizes like 10% or 1% (or less) and running it on your smartphone, which is not even that hard in practice, is impossible and cannot exist. That seems a little puzzling. Comment by gwern on Inverse p-zombies: the other direction in the Hard Problem of Consciousness · 2020-02-17T22:40:43.494Z · score: 5 (2 votes) · LW · GW "Anesthetizing the Public Conscience: Lethal Injection and Animal Euthanasia", Alper 2008: Alper reviews curare etc and argues that US lethal injections, because of the use of paralytics and slow potassium poisons rather than quick effective standard veterinarian-style sodium pentothal injections, is manufactured anesthesia awareness: No inmate has ever survived a botched lethal injection, so we do not know what it feels like to lie paralyzed on a gurney, unable even to blink an eye, consciously suffocating, while potassium burns through the veins on its way to the heart, until it finally causes cardiac arrest. But aided by the accounts of people who have suffered conscious paralysis on the operating table, one can begin to imagine. Comment by gwern on A Rational Altruist Punch in The Stomach · 2020-02-13T23:45:56.908Z · score: 27 (6 votes) · LW · GW Philip Trammel has criticized my comment here: https://philiptrammell.com/static/discounting_for_patient_philanthropists.pdf#page=33 He makes 3 points: 1. Perhaps the many failed philanthropies were not meant to be permanent? First, they almost certainly were. Most philanthropies were clan or religious-based. Things like temples and monasteries are meant to be eternal as possible. What Buddhist monastery or Catholic cathedral was ever set up with the idea that it'd wind up everything in a century or two? What dedication of a golden tripod to the Oracle at Delphi was done with the idea that they'd be done with the whole silly paganism thing in half a millennium? What clan compound was created by a patriarch not hoping to be commemorated and his grave honored for generations without end? Donations were inalienable, and often made with stipulations like a mass being said for the donators' soul once a year forever or the Second Coming, whichever happened first. How many funny traditions or legal requirements at Oxford or Cambridge, which survive due to a very unusual degree of institutional & property right continuity in England, came with expiration dates or entailments which expired? (None come to mind.) The Islamic world went so far as to legally remove any option of being temporary! To the extent that philanthropies are not encumbered today, it's not for any lack of desire by philanthropists (as charities constantly complain & dream of 'unrestricted' funds), but legal systems refusing to enforce them via the dead hand doctrine, disruption of property rights, and creative destruction. My https://www.gwern.net/The-Narrowing-Circle is relevant, as is Fukuyama's The Origins of Political Order, which makes clear what a completely absurd thing that is to suggest of places like Rome or China. Second, even if they were not, most of them do not expire due to reaching scheduled expiration dates, showing that existing structures are inadequate even to the task of lasting just a little while. Trammel seems to believe there is some sort of silver bullet institutional structure that might allow a charity to accumulate wealth for centuries or millennia if only the founders purchased the 1000-year charity plan instead of cheaping out by buying the limited-warranty 100-year charity plan. But there isn't. 2. His second point is, I'm not sure how to summarize it: Second, it is misleading to cite the large numbers of failed philanthropic institutions (such as Islamic waqfs) which were intended to be permanent, since their closures were not independent. For illustration, if a wave of expropriation (say, through a regional conquest) is a Poisson process withλ= 0.005, then the probability of a thousand-year waqf is 0.7%. Splitting a billion-dollar waqf into a billion one-dollar waqfs, and observing that none survive the millennium, will give the impression that “the long-term waqf survival rate is less than one in one billion”. I can't see how this point is relevant. Aside from his hypothetical not being the case (the organizational death statistics are certainly not based on any kind of fission like that), if a billion waqfs all manage to fail, that is a valid observation about the durability of waqfs. If they were split apart, then they all had separate managers/staff, separate tasks, separate endowments etc. There will be some correlation, and this will affect, say, confidence intervals - but the percentage is what it is. 3. His third point argues that the risk needs to grow with size for perpetuities to be bad ideas. This doesn't seem right either. I gave many reasons quite aside from that against perpetuities, and his arguments against the very plausible increasing of risk aren't great either (pogroms vs the expropriation of the Church? but how can that be comparable when by definition the net worth of the poor is near-zero?). A handful of relatively recent attempts explicitly to found long-term trusts have met with with partial success (Benjamin Franklin) or comical failure (James Holdeen). Unfortunately, there have not been enough of these cases to draw any compelling conclusions. I'd say there's more than enough when you don't handwave away millennia of examples. Incidentally, I ran into another failure of long-term trusts recently: Wellington R. Burt's estate trustees managed to, over almost a century of investment in the USA during possibly the greatest sustained total economic growth in all of human history, with only minor disbursements and some minor legal defeats, no scandals or expropriation or anything, nevertheless realize a real total return of around 75% (turning the then-inflation-adjusted equivalent of ~400m into ~$100m). Comment by gwern on Suspiciously balanced evidence · 2020-02-13T17:49:53.565Z · score: 13 (4 votes) · LW · GW If you look at prediction datasets like PredictionBook or GJP or other calibration datasets (or even prediction markets with their longshot biases), which cover a wide variety of questions (far wider than most policy or political debates, and typically with neutral valence such that most predictors are disinterested), it seems like people are generally uncalibrated in the direction of extremes, not 50%. So that's evidence against people actually holding beliefs which are biased to be too close to 50%, and suggests something else is doing on, like topic filtering or attempting to appear rhetorically reasonable / nonfanatical. (The second definitely seems like a concern. I notice that people seem to really shy away from publicly espousing strong stands like when we were discussing the Amanda Knox case, or putting a 0/100% on PB even when that is super obviously correct just from base rates; there's clear status/signaling dynamics going on there.) Comment by gwern on What are the risks of having your genome publicly available? · 2020-02-13T03:28:58.886Z · score: 16 (6 votes) · LW · GW It's worth noting that the Personal Genome Project was created ~2008 in part to test this question empirically: participates upload their genomes to the PGP website where it is 100% public, and they are periodically surveyed and asked if they have experienced any harms from their genome being available. As far as I know, the several hundred/thousand participants have yet to report any substantial harms happening. Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-02-03T17:59:51.374Z · score: 10 (4 votes) · LW · GW One question I forgot: how should multi-author citations, currently denoted by 'et al' or 'et al.', be handled? That notation is pretty ridiculous: not only does it take up 6 letters and is natural language which should be a symbol, it's ambiguous & hard to machine-parse, and it's not even English*! Writing 'Foo et al2010' or 'Fooet al 2010' doesn't look very nice, and it makes the subscripting far less compact. My current suggestion is to do the obvious thing: when you elide or omit something in English or technical writing, how do you express that? Why, with an ellipsis '…', of course. So one would just write 'Foo…2010' or possibly 'Foo…2010'. Horizontal ellipsis aren't the only kind: there are several others in Unicode, including midline '⋯' and vertical '⋮' and even down right diagonal ellipsis '⋱', so one could imagine doing 'Foo⋯2010' or '' or 'Foo⋱2010'. The vertical ellipsis is nice but unfortunately it's hard to see the first/top dot because it almost overlaps with the final letter. The midline ellipsis is very middling, and doesn't really have any virtue. But I particularly like the last one, down-right-diagonal ellipsis, because it works visually so well - it leads the eye down and to the right and is clear about it being an entire phrase, so to speak. * Actually, it's not even Latin because it's an abbreviation for the actual Latin phrase, et alii (to save you one character and also avoid any question of conjugating the Latin - this shit is fractal, is what I'm saying), but as pseudo-Latin, that means that many will italicize it, as foreign words/phrases usually are - but now that is even more work, even more visual clutter, and introduces ambiguity with other uses of italics like titles. Truly a nasty bit of work. Comment by gwern on Why Do You Keep Having This Problem? · 2020-01-20T21:11:59.997Z · score: 8 (3 votes) · LW · GW before word got to him that the layout was broken on mobile devices Emphasizing the point even more - word didn't get to me. I just thought to myself, 'the layout might not be good on mobile. I ought to check.' (It was not good.) Comment by gwern on Bay Solstice 2019 Retrospective · 2020-01-18T00:32:09.161Z · score: 23 (7 votes) · LW · GW Yeah, after watching that, I can't see how anyone reasonable could dislike it. That was awesome. Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T18:38:43.402Z · score: 4 (2 votes) · LW · GW When I noticed a reply from ‘gwern’, I admit was mildly concerned that there would be a link to a working webpage and a paypal link Oh, well, if you want to pay for StyleGAN artwork, that can be arranged. Do you think training a language model, whether it is GPT-2 or a near term successor entirely on math papers could have value? No, but mostly because there are so many more direct approaches to using NNs in math, like (to cite just the NN math papers I happened to read yesterday) planning in latent space or seq2seq rewriting. (Just because you can solve math problems in natural language input/output format with Transformers doesn't mean you should try to solve it that way.) Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T18:33:58.405Z · score: 3 (1 votes) · LW · GW Feeding in output as input is exactly what is iterative about DeepDream, and the scenario does not change the fact that GPT-2 and DeepDream are fundamentally different in many important ways and there is no sense in which they are 'fundamentally the same', not even close. And let's consider the chutzpah of complaining about tone when you ended your own highly misleading comment with the snide But by all means, spend your$1000 on it. Maybe you’ll learn something in the process.

Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T16:38:04.244Z · score: 4 (2 votes) · LW · GW

I predict that you think artwork created with StyleGAN by definition cannot have artistic merit on its own.

Which is amusing because when people look at StyleGAN artwork and they don't realize it, like my anime faces, they often quite like it. Perhaps they just haven't seen anime faces drawn by a true Scotsman yet.

Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T16:30:05.451Z · score: 4 (4 votes) · LW · GW

GPT-2 is best described IMHO as "DeepDream for text." They use different neural network architectures, but that's because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.

If by 'fundamentally the same' you mean 'actually they're completely different and optimize completely different things and give completely different results on completely different modalities', then yeah, sure. (Also, a dog is an octopus.) DeepDream is a iterative optimization process which tries to maximize the class-ness of an image input (usually, dogs); a language model like GPT-2 is predicting the most likely next observation in a natural text dataset which can be fed its own guesses. They bear about as much relation as a propaganda poster and a political science paper.

Comment by gwern on A LessWrong Crypto Autopsy · 2020-01-15T00:34:59.905Z · score: 21 (7 votes) · LW · GW

There's something I should note that doesn't come through in this post: one of the reasons I was interested in Bitcoin in 2011 is because it was obvious to me that the 'experts' (economists, cryptographers, what have you) scoffing at it Just Did Not Get It.

The critics generally made blitheringly stupid criticisms which showed that they had not even read the (very short) whitepaper, saying things like 'what if the Bitcoin operator just rolls back transactions or gets hacked' or 'what stops miners from just rewriting the history' or 'the deflationary death spiral will kick in any day now' or 'what happens when someone uses a lot of computers to take over the network'. (There were much dumber ones than that, which I have mercifully forgotten.) Even the most basic reading comprehension was enough to reveal most of the criticisms were sheer nonsense, you didn't need to be a crypto expert (certainly I was not, and still am not, either a mathematician or C++ coder, and wouldn't know what to do with an exponent if you gave it to me). Many of them showed their ideological cards, like Paul Krugman or Charles Stross, and revealed that their objections were excuses because they disliked the potential reduction in state power - I mean, wow, talk about 'politics is the mind-killer'. I think I remarked on IRC back then that every time I read a blog post or op-ed 'debunking' Bitcoin, it made me want to buy even more Bitcoin. (I couldn't because I was pretty much bankrupt and wound up selling most of the Bitcoin I did have. But I sure did want to.)

Even cryptopunks often didn't seem to get it, and I wrote a whole essay in 2011 trying to explain their incomprehension and explain to them what the whole point was and why it worked in practice but not their theory ("Bitcoin is Worse is Better"). So, it was quite clear to me that Bitcoin was, objectively, misunderstood, and a 'secret' in the Thiel sense.

And in a market where a price is either too low or too high, 'reversed stupidity' is intelligence...

(If anyone was wondering: I don't think this argument really holds right now. Discussions of Bitcoin are far more sophisticated, and the critics generally avoid the dumbest old arguments. They often even manage to avoid making any factual errors - although I've wondered about some of the Tether criticisms, which rely on what looks like rather dubious statistics.)

What sort of luck or cognitive strategy did this require? I think it did require a good deal of luck simply to be in LW circles where we have enough cryptopunk influence to happen to hear about Bitcoin early on. Otherwise, it would be unreasonable to expect people to somehow pluck Bitcoin out of the entire universe of obscure niche products like 'all penny stocks'. But once you pass that filter, all you really needed was to, while reading about interesting fun developments online, simply not let your brains fall out and notice that the critics were not doing even the most basic due diligence or making logically valid arguments and had clear (bad) reasons for opposing Bitcoin, and understand that implied Bitcoin was undervalued and a financial opportunity. I definitely do not see anything negative about most people for not getting into Bitcoin in 2011, since there's no good ex ante reason for them to have been interested in or read up on it and doing so in general is probably a bad use of time - but for LWers, we had other reasons for being interested enough in Bitcoin to realize what an opportunity it was, so there it is a bit of a failure to not get involved.

Comment by gwern on How has the cost of clothing insulation changed since 1970 in the USA? · 2020-01-12T23:48:21.486Z · score: 5 (2 votes) · LW · GW

My favorite example is teddy bears:

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-10T03:06:29.310Z · score: 3 (1 votes) · LW · GW

You'll need a bunch in a single passage. If you don't need to disambiguate a large hairball of differently-timed people (like in My Best and Worst Mistake), then you probably shouldn't bother in general.

Would you say that about citations? "Oh, you only use one source in this paragraph, so just omit the author/year/title. The reader can probably figure it out from mentions elsewhere if they really need to anyway." That the use of subscripts is particularly clear when you have a hairball of references (in an example constructed to show benefits) doesn't mean solitary uses are useless.

I'm struggling to see how this is an improvement over "on FB" or "on Facebook" for either the reader or the writer, assuming you don't want to bury-but-still-mention the medium/audience.

It's a matter of emphasis. Yes, you can write it out longhand, much as you can write out any equation or number long hand as not but "twenty-two divided by two-hundred-and-thirty" if necessary. Natural language is Turing-complete, so to speak: anything you do in a typographic way or a DSL like equations can be done as English (and of course, prior to the invention of various notations, people did write out equations like that, as painful as it is trying to imagine doing algebra while writing everything out without the benefit of even equal-signs). But you usually shouldn't.

Is the mention of being Facebook in that example so important it must be called out like that? I didn't think so. It seemed like the kind of snark a husband might make in passing. Writing it out feels like 'explaining the joke'. Snark doesn't work if you need to surround it in flashing neon lights with arrows pointing inward saying "I am being sarcastic and cynical and ironic here". You can modify the example in your head to something which puts less emphasis on Facebook, if you feel strongly about it.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-10T01:41:29.119Z · score: 3 (1 votes) · LW · GW

I don't think they're confusingly different. See the "A single unified notation..." part. Distinguishing the two typographically is codex chauvinism.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T23:35:45.808Z · score: 3 (1 votes) · LW · GW

Yes, seems sensible: hard to go wrong if you copy the Pandoc syntax. You'll need to add a mention of this to the LW docs, of course, because the existing docs don't mention sub/superscript either way, and users might assume that LW still copies the Reddit behavior of no-support.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T20:10:09.747Z · score: 3 (1 votes) · LW · GW

It seems worth it in nerdy circles (i.e. among people who’re already familiar with subscripting) for passages that are dense with jumping around in time as in your chosen example, but I’d expect these sorts of passages to be rare, regardless of the expected readership.

But if passages aren't dense with that or other uses, then you wouldn't need to use subscripting much, by definition....

Perhaps you meant, "assuming that it remains a unique convention, most readers will have to pay a one-time cost of comprehension/dislike as overhead, and only then can gain from it; so you'll need them to read a lot of it to pay off, and such passages may be quite rare"? Definitely a problem. A bit less of one if I were to start using it systematically, though, since I could assume that many readers will have read one of my other writings using the convention and had already paid the price.

Also, it’s unclear why “on Facebook” deserves to be compressed into an evidential.

Because it brings out the contrast: one is based on first-hand experience & observation, and the other is later socially-performative kvetching for an audience such as family or female acquaintances. The medium is the message, in this case.

At the very least, “FB” isn’t immediately obvious what it refers to, whereas a date is easier to figure out from context.

I waffled on whether to make it 'FB' or 'Facebook'. I thought "FB" as an abbreviation was sufficiently widely known at this point to make it natural. But maybe not, if even LWers are thrown by it.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T17:11:51.049Z · score: 8 (3 votes) · LW · GW

On a side note: it really would be nice if we could have normal Markdown subscripts/superscripts supported on LW. It's not like we don't discuss STEM topics all the time, and using Latex is overkill and easy to get wrong if you don't regularly write Tex.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T00:21:34.480Z · score: 4 (2 votes) · LW · GW

Yes, this relies heavily on the fact that subscripts are small/compact and can borrow meaning from their STEM uses. Doing it as superscripts, for example, probably wouldn't work as well, because we don't use superscripts for this sort of thing & already use superscripts heavily for other things like footnotes, while some entirely new symbol or layout is asking to fail & would make it harder to fall back to natural language. (If you did it as, say, a third column, or used some sort of 2-column layout like in some formal languages.)

How are you doing inflation adjustment? I mocked up a bunch of possibilities and I wasn't satisfied with any of them. If you suppress one of the years, you risk confusing the reader given that it's a new convention, but if you provide all the variables, it ensure comprehension but is busy & intrusive.

Comment by gwern on Dec 2019 gwern.net newsletter · 2020-01-06T03:05:17.233Z · score: 10 (3 votes) · LW · GW

You're welcome. I didn't want to write it because I don't find it that interesting and such a post always takes a frustrating amount of time to write because one has to dig into the details & jailbreak a lot of papers etc, and I'd rather write more constructive things like about all our GPT-2 projects than yet another demoralizing criticism piece, but people kept mentioning it as if it wasn't awful, so... At least it should spare anyone else, right?

Comment by gwern on Less Wrong Poetry Corner: Walter Raleigh's "The Lie" · 2020-01-05T03:45:44.118Z · score: 25 (8 votes) · LW · GW

Sometimes it may take a thief to catch a thief. If it was written in 1592, Rayleigh was at his height then, and had much opportunity to see inside the institutions he attacks.

I'm reminded of a book review I wrote last week about famed psychologist Robert Rosenthal's book on bias and error in psychology & the sciences.

Rosenthal writes lucidly about how experimenter biases can skew results or skew the analysis or cause publication bias (which he played a major role in raising awareness of & developing meta-analysis), gives many examples, and proposes novel & effective measures like result-blind peer review. A veritable former day Ioannidis, you might say. But in the same book, he shamelessly reports some of the worst psychological research ever done, like the 'Pygmalion effect', which he helped develop meta-analysis to defend (despite its nonexistence), and the book is a tissue of unreplicable absurd effects from start to finish, and Rosenthal has left a toxic legacy of urban legends and statistical gimmicks which are still being used to defend psi, among other things.

Something something the line goes through every human heart...

Comment by gwern on Parameter vs Synapse? · 2019-12-29T05:36:36.905Z · score: 7 (4 votes) · LW · GW

Drexler's recent AI whitepaper had some arguments in a similar vein about functional equivalence and necessary compute and comparing CNNs with the retina or visual cortex, so you might want to look at that.

Comment by gwern on More on polio and randomized clinical trials · 2019-12-28T19:16:17.957Z · score: 21 (6 votes) · LW · GW

One idea occurred to me that I haven’t heard anyone suggest: the trial didn’t have to be 50-50. With a large enough group, you could hold back a smaller subset as the control (80-20?). Again, you need statistics here to tell you how this affects the power of your test.

You can see that as just a simple version of an adaptive trial, with one step. I don't think it in any way resolves the basic problem people have: if it's immoral to give half the sample the placebo, it's not exactly clear why giving a fifth the sample the placebo is moral.

So, the tests had to be more than scientifically sound. They had to be politically sound. The trials had to be so conclusive that it would silence even jealous critics using motivated, biased reasoning. They had to prove themselves not only to a reasoning mind, but to a committee. A proper RCT was needed for credibility as much as, or more than, for science.

This is an important point. One thing I only relatively recently understood about experiment design was something Gelman has mentioned in passing on occasion: an ideal Bayesian experimenter doesn't randomize!

Why not? Because, given their priors, there is always another allocation rule which still accomplishes the goal of causal inference (the allocation rule makes its decisions independent of all confounders on average, like randomization, so estimates the causal effect) but does so with the same or lower variance, such as using alternating-allocation (so the experimental and control group always have as identical n as possible, while simple randomization one-by-one will usually result in excess n in one group - which is inefficient). These sorts of rules pose no problem and can be included in the Bayesian model of the process.

The problem is that it will then be inefficient for observers with different priors, who will learn much less. Depending on their priors or models, it may be almost entirely uninformative. By using explicit randomization and no longer making allocations which are based on your priors in any way, you sacrifice efficiency, but the results are equally informative for all observers. If you model the whole process and consider the need to persuade outside observers in order to implement the optimal decision, then randomization is clearly necessary.

Comment by gwern on Finding a quote: "proof by contradiction is the closest math comes to irony" · 2019-12-26T18:14:30.822Z · score: 14 (4 votes) · LW · GW

You probably read my "One Man's Modus Ponens" page, where I quote a Timothy Gowers essay on proof by contradiction and he says (and then goes on to discuss two ways to regard the irrationality of as compared with complex numbers):

...a suggestion was made that proofs by contradiction are the mathematician’s version of irony. I’m not sure I agree with that: when we give a proof by contradiction, we make it very clear that we are discussing a counterfactual, so our words are intended to be taken at face value. But perhaps this is not necessary. ...

...Integers with this remarkable property are quite unlike the integers we are familiar with: as such, they are surely worthy of further study.

...Numbers with this remarkable property are quite unlike the numbers we are familiar with: as such, they are surely worthy of further study.

Comment by gwern on Why the tails come apart · 2019-12-25T21:49:50.482Z · score: 7 (3 votes) · LW · GW

I have found something interesting in the 'asymptotic independence' order statistics literature: apparently it's been proven since 1960 that the extremes of two correlated distributions are asymptotically independent (obviously when r != 1 or -1). So as you increase n, the probability of double-maxima decreases to the lower bound of 1/n.

The intuition here seems to be that n increases faster than increased deviation for any r, which functions as a constant-factor boost; so if you make n arbitrarily large, you can arbitrarily erode away the constant-factor boost of any r, and thus decrease the max-probability.

I suspected as much from my Monte Carlo simulations (Figure 2), but nice to have it proven for the maxima and minima. (I didn't understand the more general papers, so I'm not sure what other order statistics are asymptotically independent: it seems like it should be all of them? But some papers need to deal with multiple classes of order statistics, so I dunno - are there order statistics, like maybe the median, where the probability of being the same order in both samples doesn't converge on 1/n?)

Comment by gwern on Neural networks as non-leaky mathematical abstraction · 2019-12-19T18:57:51.425Z · score: 4 (3 votes) · LW · GW

So is this an argument for the end-to-end principle?

Comment by gwern on George's Shortform · 2019-12-17T22:12:15.560Z · score: 3 (1 votes) · LW · GW

Terminology sometimes used to distinguish between 'good' and 'bad' stress is "eustress" vs "distress".

Comment by gwern on What Are Meetups Actually Trying to Accomplish? · 2019-12-16T03:20:03.345Z · score: 3 (1 votes) · LW · GW

'marginal'?

Comment by gwern on Under what circumstances is "don't look at existing research" good advice? · 2019-12-13T18:42:45.443Z · score: 8 (3 votes) · LW · GW

Since you mention physics, it's worth noting Feynman was a big proponent of this for physics, and seemed to have multiple reasons for it.

Comment by gwern on Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28 · 2019-12-13T14:48:29.922Z · score: 6 (2 votes) · LW · GW

If you have relatively few choices and properties are correlated (as of course they are), I'm not sure how much it matters. I did a simulation of this for embryo selection with n=10, and partially randomized the utility weights made little difference.

Comment by gwern on Planned Power Outages · 2019-12-11T22:01:22.690Z · score: 5 (2 votes) · LW · GW

(Quite a lot is public outside Google, I've found. It's not necessarily easy to find, but whenever I talk to Googlers or visit, I find out less than I expected. Only a few things I've been told genuinely surprised me, and honestly, I suspected them anyway. Google's transparency is considerably underrated.)