## Posts

Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal 2020-01-08T22:20:20.290Z · score: 53 (11 votes)
Dec 2019 gwern.net newsletter 2020-01-04T20:48:48.788Z · score: 16 (6 votes)
Nov 2019 gwern.net newsletter 2019-12-02T21:16:04.846Z · score: 14 (4 votes)
October 2019 gwern.net newsletter 2019-11-14T20:26:34.236Z · score: 12 (3 votes)
September 2019 gwern.net newsletter 2019-10-04T16:44:43.147Z · score: 22 (4 votes)
"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z · score: 14 (4 votes)
August 2019 gwern.net newsletter (popups.js demo) 2019-09-01T17:52:01.011Z · score: 12 (4 votes)
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z · score: 29 (9 votes)
July 2019 gwern.net newsletter 2019-08-01T16:19:59.893Z · score: 24 (5 votes)
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z · score: 49 (12 votes)
June 2019 gwern.net newsletter 2019-07-01T14:35:49.507Z · score: 30 (5 votes)
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z · score: 27 (7 votes)
On Having Enough Socks 2019-06-13T15:15:21.946Z · score: 21 (6 votes)
May gwern.net newsletter 2019-06-01T17:25:11.740Z · score: 17 (5 votes)
"One Man's Modus Ponens Is Another Man's Modus Tollens" 2019-05-17T22:03:59.458Z · score: 34 (5 votes)
April 2019 gwern.net newsletter 2019-05-01T14:43:18.952Z · score: 11 (2 votes)
Recent updates to gwern.net (2017–2019) 2019-04-28T20:18:27.083Z · score: 36 (8 votes)
"Everything is Correlated": An Anthology of the Psychology Debate 2019-04-27T13:48:05.240Z · score: 49 (7 votes)
March 2019 gwern.net newsletter 2019-04-02T14:17:38.032Z · score: 19 (3 votes)
February gwern.net newsletter 2019-03-02T22:42:09.490Z · score: 13 (3 votes)
'This Waifu Does Not Exist': 100,000 StyleGAN & GPT-2 samples 2019-03-01T04:29:16.529Z · score: 39 (12 votes)
January 2019 gwern.net newsletter 2019-02-04T15:53:42.553Z · score: 15 (5 votes)
"Forecasting Transformative AI: An Expert Survey", Gruetzemacher et al 2019 2019-01-27T02:34:57.214Z · score: 17 (8 votes)
"AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] 2019-01-24T20:49:01.350Z · score: 62 (23 votes)
Visualizing the power of multiple step selection processes in JS: Galton's bean machine 2019-01-12T17:58:34.584Z · score: 27 (8 votes)
Littlewood's Law and the Global Media 2019-01-12T17:46:09.753Z · score: 37 (8 votes)
Evolution as Backstop for Reinforcement Learning: multi-level paradigms 2019-01-12T17:45:35.485Z · score: 18 (4 votes)
December gwern.net newsletter 2019-01-02T15:13:02.771Z · score: 20 (4 votes)
Internet Search Tips: how I use Google/Google Scholar/Libgen 2018-12-12T14:50:30.970Z · score: 54 (13 votes)
November 2018 gwern.net newsletter 2018-12-01T13:57:00.661Z · score: 35 (8 votes)
October gwern.net links 2018-11-01T01:11:28.763Z · score: 31 (8 votes)
Whole Brain Emulation & DL: imitation learning for faster AGI? 2018-10-22T15:07:54.585Z · score: 15 (5 votes)
New /r/gwern subreddit for link-sharing 2018-10-17T22:49:36.252Z · score: 46 (14 votes)
September links 2018-10-08T21:52:10.642Z · score: 18 (6 votes)
Genomic Prediction is now offering embryo selection 2018-10-07T21:27:54.071Z · score: 39 (14 votes)
August gwern.net links 2018-09-25T15:57:20.808Z · score: 18 (5 votes)
July gwern.net newsletter 2018-08-02T13:42:16.534Z · score: 24 (8 votes)
June gwern.net newsletter 2018-07-04T22:59:00.205Z · score: 36 (8 votes)
May gwern.net newsletter 2018-06-01T14:47:19.835Z · score: 73 (14 votes)
$5m cryptocurrency donation to Alcor by Brad Armstrong in memory of LWer Hal Finney 2018-05-17T20:31:07.942Z · score: 48 (12 votes) Tech economics pattern: "Commoditize Your Complement" 2018-05-10T18:54:42.191Z · score: 97 (27 votes) April links 2018-05-10T18:53:48.970Z · score: 20 (6 votes) March gwern.net link roundup 2018-04-20T19:09:29.785Z · score: 27 (6 votes) Recent updates to gwern.net (2016-2017) 2017-10-20T02:11:07.808Z · score: 7 (7 votes) The NN/tank Story Probably Never Happened 2017-10-20T01:41:06.291Z · score: 2 (2 votes) Regulatory lags for New Technology [2013 notes] 2017-05-31T01:27:52.046Z · score: 5 (5 votes) "AIXIjs: A Software Demo for General Reinforcement Learning", Aslanides 2017 2017-05-29T21:09:53.566Z · score: 4 (4 votes) Keeping up with deep reinforcement learning research: /r/reinforcementlearning 2017-05-16T19:12:04.201Z · score: 3 (4 votes) "The unrecognised simplicities of effective action #2: 'Systems engineering’ and 'systems management' - ideas from the Apollo programme for a 'systems politics'", Cummings 2017 2017-02-17T00:59:04.256Z · score: 9 (8 votes) Decision Theory subreddit 2017-02-07T18:42:55.277Z · score: 6 (7 votes) ## Comments Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T18:38:43.402Z · score: 4 (2 votes) · LW · GW When I noticed a reply from ‘gwern’, I admit was mildly concerned that there would be a link to a working webpage and a paypal link Oh, well, if you want to pay for StyleGAN artwork, that can be arranged. Do you think training a language model, whether it is GPT-2 or a near term successor entirely on math papers could have value? No, but mostly because there are so many more direct approaches to using NNs in math, like (to cite just the NN math papers I happened to read yesterday) planning in latent space or seq2seq rewriting. (Just because you can solve math problems in natural language input/output format with Transformers doesn't mean you should try to solve it that way.) Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T18:33:58.405Z · score: 3 (1 votes) · LW · GW Feeding in output as input is exactly what is iterative about DeepDream, and the scenario does not change the fact that GPT-2 and DeepDream are fundamentally different in many important ways and there is no sense in which they are 'fundamentally the same', not even close. And let's consider the chutzpah of complaining about tone when you ended your own highly misleading comment with the snide But by all means, spend your$1000 on it. Maybe you’ll learn something in the process.

Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T16:38:04.244Z · score: 4 (2 votes) · LW · GW

I predict that you think artwork created with StyleGAN by definition cannot have artistic merit on its own.

Which is amusing because when people look at StyleGAN artwork and they don't realize it, like my anime faces, they often quite like it. Perhaps they just haven't seen anime faces drawn by a true Scotsman yet.

Comment by gwern on human psycholinguists: a critical appraisal · 2020-01-16T16:30:05.451Z · score: 4 (4 votes) · LW · GW

GPT-2 is best described IMHO as "DeepDream for text." They use different neural network architectures, but that's because analyzing images and natural language require different architectures. Fundamentally their complete-the-prompt-using-training-data design is the same.

If by 'fundamentally the same' you mean 'actually they're completely different and optimize completely different things and give completely different results on completely different modalities', then yeah, sure. (Also, a dog is an octopus.) DeepDream is a iterative optimization process which tries to maximize the class-ness of an image input (usually, dogs); a language model like GPT-2 is predicting the most likely next observation in a natural text dataset which can be fed its own guesses. They bear about as much relation as a propaganda poster and a political science paper.

Comment by gwern on A LessWrong Crypto Autopsy · 2020-01-15T00:34:59.905Z · score: 20 (6 votes) · LW · GW

There's something I should note that doesn't come through in this post: one of the reasons I was interested in Bitcoin in 2011 is because it was obvious to me that the 'experts' (economists, cryptographers, what have you) scoffing at it Just Did Not Get It.

The critics generally made blitheringly stupid criticisms which showed that they had not even read the (very short) whitepaper, saying things like 'what if the Bitcoin operator just rolls back transactions or gets hacked' or 'what stops miners from just rewriting the history' or 'the deflationary death spiral will kick in any day now' or 'what happens when someone uses a lot of computers to take over the network'. (There were much dumber ones than that, which I have mercifully forgotten.) Even the most basic reading comprehension was enough to reveal most of the criticisms were sheer nonsense, you didn't need to be a crypto expert (certainly I was not, and still am not, either a mathematician or C++ coder, and wouldn't know what to do with an exponent if you gave it to me). Many of them showed their ideological cards, like Paul Krugman or Charles Stross, and revealed that their objections were excuses because they disliked the potential reduction in state power - I mean, wow, talk about 'politics is the mind-killer'. I think I remarked on IRC back then that every time I read a blog post or op-ed 'debunking' Bitcoin, it made me want to buy even more Bitcoin. (I couldn't because I was pretty much bankrupt and wound up selling most of the Bitcoin I did have. But I sure did want to.)

Even cryptopunks often didn't seem to get it, and I wrote a whole essay in 2011 trying to explain their incomprehension and explain to them what the whole point was and why it worked in practice but not their theory ("Bitcoin is Worse is Better"). So, it was quite clear to me that Bitcoin was, objectively, misunderstood, and a 'secret' in the Thiel sense.

And in a market where a price is either too low or too high, 'reversed stupidity' is intelligence...

(If anyone was wondering: I don't think this argument really holds right now. Discussions of Bitcoin are far more sophisticated, and the critics generally avoid the dumbest old arguments. They often even manage to avoid making any factual errors - although I've wondered about some of the Tether criticisms, which rely on what looks like rather dubious statistics.)

What sort of luck or cognitive strategy did this require? I think it did require a good deal of luck simply to be in LW circles where we have enough cryptopunk influence to happen to hear about Bitcoin early on. Otherwise, it would be unreasonable to expect people to somehow pluck Bitcoin out of the entire universe of obscure niche products like 'all penny stocks'. But once you pass that filter, all you really needed was to, while reading about interesting fun developments online, simply not let your brains fall out and notice that the critics were not doing even the most basic due diligence or making logically valid arguments and had clear (bad) reasons for opposing Bitcoin, and understand that implied Bitcoin was undervalued and a financial opportunity. I definitely do not see anything negative about most people for not getting into Bitcoin in 2011, since there's no good ex ante reason for them to have been interested in or read up on it and doing so in general is probably a bad use of time - but for LWers, we had other reasons for being interested enough in Bitcoin to realize what an opportunity it was, so there it is a bit of a failure to not get involved.

Comment by gwern on How has the cost of clothing insulation changed since 1970 in the USA? · 2020-01-12T23:48:21.486Z · score: 5 (2 votes) · LW · GW

My favorite example is teddy bears:

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-10T03:06:29.310Z · score: 3 (1 votes) · LW · GW

You'll need a bunch in a single passage. If you don't need to disambiguate a large hairball of differently-timed people (like in My Best and Worst Mistake), then you probably shouldn't bother in general.

Would you say that about citations? "Oh, you only use one source in this paragraph, so just omit the author/year/title. The reader can probably figure it out from mentions elsewhere if they really need to anyway." That the use of subscripts is particularly clear when you have a hairball of references (in an example constructed to show benefits) doesn't mean solitary uses are useless.

I'm struggling to see how this is an improvement over "on FB" or "on Facebook" for either the reader or the writer, assuming you don't want to bury-but-still-mention the medium/audience.

It's a matter of emphasis. Yes, you can write it out longhand, much as you can write out any equation or number long hand as not but "twenty-two divided by two-hundred-and-thirty" if necessary. Natural language is Turing-complete, so to speak: anything you do in a typographic way or a DSL like equations can be done as English (and of course, prior to the invention of various notations, people did write out equations like that, as painful as it is trying to imagine doing algebra while writing everything out without the benefit of even equal-signs). But you usually shouldn't.

Is the mention of being Facebook in that example so important it must be called out like that? I didn't think so. It seemed like the kind of snark a husband might make in passing. Writing it out feels like 'explaining the joke'. Snark doesn't work if you need to surround it in flashing neon lights with arrows pointing inward saying "I am being sarcastic and cynical and ironic here". You can modify the example in your head to something which puts less emphasis on Facebook, if you feel strongly about it.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-10T01:41:29.119Z · score: 3 (1 votes) · LW · GW

I don't think they're confusingly different. See the "A single unified notation..." part. Distinguishing the two typographically is codex chauvinism.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T23:35:45.808Z · score: 3 (1 votes) · LW · GW

Yes, seems sensible: hard to go wrong if you copy the Pandoc syntax. You'll need to add a mention of this to the LW docs, of course, because the existing docs don't mention sub/superscript either way, and users might assume that LW still copies the Reddit behavior of no-support.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T20:10:09.747Z · score: 3 (1 votes) · LW · GW

It seems worth it in nerdy circles (i.e. among people who’re already familiar with subscripting) for passages that are dense with jumping around in time as in your chosen example, but I’d expect these sorts of passages to be rare, regardless of the expected readership.

But if passages aren't dense with that or other uses, then you wouldn't need to use subscripting much, by definition....

Perhaps you meant, "assuming that it remains a unique convention, most readers will have to pay a one-time cost of comprehension/dislike as overhead, and only then can gain from it; so you'll need them to read a lot of it to pay off, and such passages may be quite rare"? Definitely a problem. A bit less of one if I were to start using it systematically, though, since I could assume that many readers will have read one of my other writings using the convention and had already paid the price.

Also, it’s unclear why “on Facebook” deserves to be compressed into an evidential.

Because it brings out the contrast: one is based on first-hand experience & observation, and the other is later socially-performative kvetching for an audience such as family or female acquaintances. The medium is the message, in this case.

At the very least, “FB” isn’t immediately obvious what it refers to, whereas a date is easier to figure out from context.

I waffled on whether to make it 'FB' or 'Facebook'. I thought "FB" as an abbreviation was sufficiently widely known at this point to make it natural. But maybe not, if even LWers are thrown by it.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T17:11:51.049Z · score: 8 (3 votes) · LW · GW

On a side note: it really would be nice if we could have normal Markdown subscripts/superscripts supported on LW. It's not like we don't discuss STEM topics all the time, and using Latex is overkill and easy to get wrong if you don't regularly write Tex.

Comment by gwern on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-09T00:21:34.480Z · score: 4 (2 votes) · LW · GW

Yes, this relies heavily on the fact that subscripts are small/compact and can borrow meaning from their STEM uses. Doing it as superscripts, for example, probably wouldn't work as well, because we don't use superscripts for this sort of thing & already use superscripts heavily for other things like footnotes, while some entirely new symbol or layout is asking to fail & would make it harder to fall back to natural language. (If you did it as, say, a third column, or used some sort of 2-column layout like in some formal languages.)

How are you doing inflation adjustment? I mocked up a bunch of possibilities and I wasn't satisfied with any of them. If you suppress one of the years, you risk confusing the reader given that it's a new convention, but if you provide all the variables, it ensure comprehension but is busy & intrusive.

Comment by gwern on Dec 2019 gwern.net newsletter · 2020-01-06T03:05:17.233Z · score: 10 (3 votes) · LW · GW

You're welcome. I didn't want to write it because I don't find it that interesting and such a post always takes a frustrating amount of time to write because one has to dig into the details & jailbreak a lot of papers etc, and I'd rather write more constructive things like about all our GPT-2 projects than yet another demoralizing criticism piece, but people kept mentioning it as if it wasn't awful, so... At least it should spare anyone else, right?

Comment by gwern on Less Wrong Poetry Corner: Walter Raleigh's "The Lie" · 2020-01-05T03:45:44.118Z · score: 25 (8 votes) · LW · GW

Sometimes it may take a thief to catch a thief. If it was written in 1592, Rayleigh was at his height then, and had much opportunity to see inside the institutions he attacks.

I'm reminded of a book review I wrote last week about famed psychologist Robert Rosenthal's book on bias and error in psychology & the sciences.

Rosenthal writes lucidly about how experimenter biases can skew results or skew the analysis or cause publication bias (which he played a major role in raising awareness of & developing meta-analysis), gives many examples, and proposes novel & effective measures like result-blind peer review. A veritable former day Ioannidis, you might say. But in the same book, he shamelessly reports some of the worst psychological research ever done, like the 'Pygmalion effect', which he helped develop meta-analysis to defend (despite its nonexistence), and the book is a tissue of unreplicable absurd effects from start to finish, and Rosenthal has left a toxic legacy of urban legends and statistical gimmicks which are still being used to defend psi, among other things.

Something something the line goes through every human heart...

Comment by gwern on Parameter vs Synapse? · 2019-12-29T05:36:36.905Z · score: 6 (3 votes) · LW · GW

Drexler's recent AI whitepaper had some arguments in a similar vein about functional equivalence and necessary compute and comparing CNNs with the retina or visual cortex, so you might want to look at that.

Comment by gwern on More on polio and randomized clinical trials · 2019-12-28T19:16:17.957Z · score: 21 (6 votes) · LW · GW

One idea occurred to me that I haven’t heard anyone suggest: the trial didn’t have to be 50-50. With a large enough group, you could hold back a smaller subset as the control (80-20?). Again, you need statistics here to tell you how this affects the power of your test.

You can see that as just a simple version of an adaptive trial, with one step. I don't think it in any way resolves the basic problem people have: if it's immoral to give half the sample the placebo, it's not exactly clear why giving a fifth the sample the placebo is moral.

So, the tests had to be more than scientifically sound. They had to be politically sound. The trials had to be so conclusive that it would silence even jealous critics using motivated, biased reasoning. They had to prove themselves not only to a reasoning mind, but to a committee. A proper RCT was needed for credibility as much as, or more than, for science.

This is an important point. One thing I only relatively recently understood about experiment design was something Gelman has mentioned in passing on occasion: an ideal Bayesian experimenter doesn't randomize!

Why not? Because, given their priors, there is always another allocation rule which still accomplishes the goal of causal inference (the allocation rule makes its decisions independent of all confounders on average, like randomization, so estimates the causal effect) but does so with the same or lower variance, such as using alternating-allocation (so the experimental and control group always have as identical n as possible, while simple randomization one-by-one will usually result in excess n in one group - which is inefficient). These sorts of rules pose no problem and can be included in the Bayesian model of the process.

The problem is that it will then be inefficient for observers with different priors, who will learn much less. Depending on their priors or models, it may be almost entirely uninformative. By using explicit randomization and no longer making allocations which are based on your priors in any way, you sacrifice efficiency, but the results are equally informative for all observers. If you model the whole process and consider the need to persuade outside observers in order to implement the optimal decision, then randomization is clearly necessary.

Comment by gwern on Finding a quote: "proof by contradiction is the closest math comes to irony" · 2019-12-26T18:14:30.822Z · score: 14 (4 votes) · LW · GW

You probably read my "One Man's Modus Ponens" page, where I quote a Timothy Gowers essay on proof by contradiction and he says (and then goes on to discuss two ways to regard the irrationality of as compared with complex numbers):

...a suggestion was made that proofs by contradiction are the mathematician’s version of irony. I’m not sure I agree with that: when we give a proof by contradiction, we make it very clear that we are discussing a counterfactual, so our words are intended to be taken at face value. But perhaps this is not necessary. ...

...Integers with this remarkable property are quite unlike the integers we are familiar with: as such, they are surely worthy of further study.

...Numbers with this remarkable property are quite unlike the numbers we are familiar with: as such, they are surely worthy of further study.

Comment by gwern on Why the tails come apart · 2019-12-25T21:49:50.482Z · score: 5 (2 votes) · LW · GW

I have found something interesting in the 'asymptotic independence' order statistics literature: apparently it's been proven since 1960 that the extremes of two correlated distributions are asymptotically independent (obviously when r != 1 or -1). So as you increase n, the probability of double-maxima decreases to the lower bound of 1/n.

The intuition here seems to be that n increases faster than increased deviation for any r, which functions as a constant-factor boost; so if you make n arbitrarily large, you can arbitrarily erode away the constant-factor boost of any r, and thus decrease the max-probability.

I suspected as much from my Monte Carlo simulations (Figure 2), but nice to have it proven for the maxima and minima. (I didn't understand the more general papers, so I'm not sure what other order statistics are asymptotically independent: it seems like it should be all of them? But some papers need to deal with multiple classes of order statistics, so I dunno - are there order statistics, like maybe the median, where the probability of being the same order in both samples doesn't converge on 1/n?)

Comment by gwern on Neural networks as non-leaky mathematical abstraction · 2019-12-19T18:57:51.425Z · score: 4 (3 votes) · LW · GW

So is this an argument for the end-to-end principle?

Comment by gwern on George's Shortform · 2019-12-17T22:12:15.560Z · score: 3 (1 votes) · LW · GW

Terminology sometimes used to distinguish between 'good' and 'bad' stress is "eustress" vs "distress".

Comment by gwern on What Are Meetups Actually Trying to Accomplish? · 2019-12-16T03:20:03.345Z · score: 3 (1 votes) · LW · GW

'marginal'?

Comment by gwern on Under what circumstances is "don't look at existing research" good advice? · 2019-12-13T18:42:45.443Z · score: 8 (3 votes) · LW · GW

Since you mention physics, it's worth noting Feynman was a big proponent of this for physics, and seemed to have multiple reasons for it.

Comment by gwern on Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28 · 2019-12-13T14:48:29.922Z · score: 6 (2 votes) · LW · GW

If you have relatively few choices and properties are correlated (as of course they are), I'm not sure how much it matters. I did a simulation of this for embryo selection with n=10, and partially randomized the utility weights made little difference.

Comment by gwern on Planned Power Outages · 2019-12-11T22:01:22.690Z · score: 5 (2 votes) · LW · GW

(Quite a lot is public outside Google, I've found. It's not necessarily easy to find, but whenever I talk to Googlers or visit, I find out less than I expected. Only a few things I've been told genuinely surprised me, and honestly, I suspected them anyway. Google's transparency is considerably underrated.)

Comment by gwern on Why the tails come apart · 2019-12-11T21:37:51.343Z · score: 5 (2 votes) · LW · GW

Heh. I've sometimes thought it'd be nice to have a copy of Eureqa or the other symbolic tools, to feed the Monte Carlo results into and see if I could deduce any exact formula given their hints. I don't need exact formulas often but it's nice to have them. I've noticed people can do apparently magical things with Mathematica in this vein. All proprietary AFAIK, though.

Comment by gwern on Why the tails come apart · 2019-12-11T16:09:56.117Z · score: 5 (2 votes) · LW · GW

You can simulate it out easily, yeah, but the exact answer seems more elusive. I asked on CrossValidated whether anyone knew the formula for 'probability of the maximum on both variables given a r and n', since it seems like something that order statistics researchers would've solved long ago because it's interesting and relevant to contests/competitions/searches/screening, but no one's given an answer yet.

Comment by gwern on Bíos brakhús · 2019-12-10T16:47:42.803Z · score: 3 (1 votes) · LW · GW

PDFs support hyperlinks: they can define anchors at arbitrary points within themselves for a hyperlink, and they can hyperlink out. You can even specify a target page in a PDF which doesn't define any usable anchors (which is dead useful and I use it all the time in references): eg https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf#page=5

So I guess the issue here is having a tool which parses and edits PDFs to insert hyperlinks. That's hard. Even if you solve the lookup problem by going through something like Semantic Scholar (the way I use https://ricon.dev/ on gwern.net for reverse citation search), PDFs aren't made for this: when you look at a bit of text which is the name of a book or paper, it may not even be text, it may just be an image... Plus, your links will die. You shouldn't trust any of those sites to stay up long-term at the exact URLs they are at.

Comment by gwern on Is there a website for tracking fads? · 2019-12-09T23:27:16.802Z · score: 8 (4 votes) · LW · GW

Calling something a 'fad' has many of the same problems as calling something a 'bubble'. It's an invitation to selective reasoning. As Sumner likes to point out, most of the things which get called a 'bubble' never turn out to be that, it was just an insult and then a bunch of cherrypicked examples and flexible reasoning (think of all the people who called Bitcoin a bubble when it collapsed to a price far higher than when they called it a bubble).

I think you could get something more useful from a more neutral formulation, like specific cultural artifacts. So 2 recent relevant papers which come to mind would be https://www.nature.com/articles/s41467-019-09311-w http://barabasi.com/f/995.pdf . You could do a post hoc analysis and operationalize 'fad' as anything which rose and fell with a sufficiently steep average slope. (Obviously, anything which rises rapidly and then never decays, or only slowly decays, doesn't match what anyone would think of as a 'fad', or rises slowly and decays slowly etc.)

Comment by gwern on Reading list: Starting links and books on studying ontology and causality · 2019-12-07T01:36:59.038Z · score: 8 (3 votes) · LW · GW

I recently read gwern's excellent "Why Correlation Usually ≠ Causation" notes, and, like any good reading, felt a profound sense of existential terror that caused me to write up a few half-formed thoughts on it. How exciting!

It may not be directly related, but I'd like to highlight that I just added a new section contextualizing the essay as a whole and explaining how it connects to the rest of my beliefs about correlation & causality: https://www.gwern.net/Causality#overview-the-current-situation

As for the broader question of where do our ontologies come from: I'd take a pragmatic point of view and point out that they must have evolved like the rest of us because thinking is for actions.

Comment by gwern on Is there a website for tracking fads? · 2019-12-06T18:29:02.721Z · score: 4 (2 votes) · LW · GW

How would you define 'fad' in an objective and non-pejorative way?

Comment by gwern on What are some non-purely-sampling ways to do deep RL? · 2019-12-05T00:33:21.465Z · score: 10 (5 votes) · LW · GW

You mean stuff like model-predictive control and planning? You can use backprop to do gradient ascent over a sequence of actions if you have a differentiable environment and/or reward model. This also has a lot of application to image CNNs: reversing GANs to encode an image for editing, optimizing to maximize a particular class (like maximally 'dog' or 'NSFW' images) etc. I cover some of the uses and history in https://www.gwern.net/Faces#reversing-stylegan-to-control-modify-images

My most recent suggestion in this vein was about OA/Christiano's preference learning, using gradient ascent directly on trajectories/strings, which avoids explicit sampling and rating in an environment.

Comment by gwern on "The Bitter Lesson", an article about compute vs human knowledge in AI · 2019-12-02T02:27:14.618Z · score: 3 (1 votes) · LW · GW

And MuZero, which applies to ALE very well?

Comment by gwern on CO2 Stripper Postmortem Thoughts · 2019-12-01T14:50:47.232Z · score: 19 (6 votes) · LW · GW

Verifying the behavioural effects is much harder

Not really. There's scads of behavioral measures you can collect passively.

you’d need to avoid unblinding,

No you don't, and blinding is easy if you think about it for a few seconds, see the comment I left well before yours.

and ideally have several different people with varying levels of age, fitness etc,

No, you don't, you are letting perfect be the enemy of better

and then you’d get affected by weather, unless your house is very well sealed...

This is a feature, not a bug.

Comment by gwern on CO2 Stripper Postmortem Thoughts · 2019-12-01T00:46:31.661Z · score: 14 (5 votes) · LW · GW

I have the relevant air sensor, it'd be really hard to blind it because it makes noise, and the behavioral effects thing is a good idea, thank you.

Just randomizing would be useful; obviously, your air sensor doesn't care in the least if it is 'blinded' or not. And if it's placed in a room you don't go into, that may be enough. As well, maybe you can modify it to have a flap or door or obstruction which opens or closes, greatly changing the rate of CO2 absorption, and randomize that; or if you have someone willing to help, they can come in every n time units to replace the filler or not, giving you both blinded & randomized comparisons between high-CO2-removal vs low-CO2-removal conditions based on whether they pulled out the used filler or not, since the fan presumably still makes the same noise regardless of whether it has brand-new filler removing CO2 at maximum rates or expired tired filler removing only a little CO2. (Remember, experiments work fine comparing 100% removal rates to, say, 10% removal rates; it doesn't have to be exactly 'on'/'off', that's just a bit more statistically-efficient because it has a slightly larger effect size, and you have to remember the estimate is a bit lower than the 'true' estimate because the 'off' condition has 10% of the benefits of the 'on'.)

Comment by gwern on CO2 Stripper Postmortem Thoughts · 2019-11-30T21:50:04.604Z · score: 54 (20 votes) · LW · GW

It's good that you built it, but it seems to me that now you have a prototype, before you start investing in a patent or business to sell scaled-up versions, it'd make more sense to invest \$100 in a CO2 air sensor and a Raspberry Pi with a switch to randomly turn it on/off: to verify that it decreases CO2 as much and as long as expected, whether you can tell when CO2 levels have been lowered, and whether this has any measurable behavioral effects. The value of such information is very high: there is no point in scaling up a design which isn't working at its basic task of lowering CO2 levels, and commercialization will be difficult if it does nothing observable (especially given our questions about how and whether CO2 does anything) and, perhaps more importantly from a marketing perspective, if the user can't feel it doing something.

Comment by gwern on What are the requirements for being "citable?" · 2019-11-29T18:24:57.479Z · score: 11 (5 votes) · LW · GW

GS is just an automated web scraping and search engine service. It can't be picky the way WP is. If you use the necessary <meta> tags (which can be auto-generated from the existing user+title metadata), it's unclear why it wouldn't be picked up by GS eventually.

Serendipitously, I added those <meta> tags to gwern.net just last week after a Twitter discussion about how and whether to get DOIs. At least in theory, I shouldn't need to get DOIs to get better visibility, but as a backup, I also registered a GS profile which may or may not let me enter pages manually. In my experience, GS updates quite slowly (I think because GS seems to be a skeleton crew passion project and not integrated into regular Google Search), so we'll see in a few months if any of this did any good.

As far as WP goes, I'm not sure what can be done. As a group blog anyone can post to, being a LW post can never confer any particular notability or RSness on its own. It has to be a case by case basis. Given the ever more deletionist approach of remaining WP editors, that will be difficult even for cases where a LW post is a very good writeup on something and would make an excellent External Link addition, unless it is something like an official statement or interview etc. (Obviously, if a MIRI staffer posts an official piece of MIRI news, there would be no problem citing it in the MIRI article. Stuff like that.) Wikipedians crave status, and so the best way really is to somehow promote pieces into formal publications in somewhere (anywhere) Peer-Reviewed™ and treat a LW post as a preprint.

Comment by gwern on Could someone please start a bright home lighting company? · 2019-11-29T02:31:55.565Z · score: 3 (1 votes) · LW · GW
Comment by gwern on [1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Arxiv · 2019-11-21T15:12:58.606Z · score: 9 (5 votes) · LW · GW

Meta-learning and transfer learning. You take over 100 million different simulated worlds, and the actual real world is a doddle.

Comment by gwern on Do we know if spaced repetition can be used with randomized content? · 2019-11-17T23:29:18.564Z · score: 9 (6 votes) · LW · GW

I proposed this idea years back as dynamic or extended flashcards. Because spaced repetition works for learning abstractions, which studies presumably entail learning from a set of testing flashcards to a set of validation flashcards, there doesn't seem to be any reason to expect SRS to fail when the testing flashcard set is itself very large or randomly-generated. Khan Academy may be an example of this: they are supposed to use spaced repetition in scheduling reviews, apparently based on Leitner, and they also apparently use randomly generated or at least templated questions in some lessons (just mathematics?).

(Incidentally, while we're discussing spaced repetition variations, I'm also pleased with my idea of "anti-spaced repetition" as useful for reviewing notes or scheduling media consumption.)

Comment by gwern on An optimal stopping paradox · 2019-11-12T18:51:14.133Z · score: 4 (2 votes) · LW · GW

Claim that there should be a finite lifetime. You can't wait forever. If there is a finite lifetime, then the same decision analysis would tell you to procrastinate until the very end. This effectively is procrastinating forever. It does not converge to a reasonable finite waiting time as your lifetime goes to infinity.

If I am a quasi-immortal who will live millions or billions of years, with, apparently, zero discount rates, no risk, and nothing else I am allowed to invest in (no opportunity cost), why shouldn't I make investment decisions which take millions of years to mature (with astronomical loads of utility at the end as a payoff for my patience), and plan over periods that short-lived impatient mayflies like yourself can scarcely comprehend?

Comment by gwern on Experiments and Consent · 2019-11-12T03:08:25.889Z · score: 23 (5 votes) · LW · GW

The claim was that A/​B test­ing was “not as good a tool for mea­sur­ing long term changes in be­hav­ior” and I’m say­ing that A/​B test­ing is a very good tool for that pur­pose.

And the paper you linked showed that it wasn't being done for most of Google's history. If Google doesn't do it, I would be doubtful if anyone, even a peer like Amazon, does. Is it such a good tool if no one uses it?

By 2013 they were cer­tainly already tak­ing into ac­count long-term value, even on mo­bile (which was pretty small un­til just around 2013). This sec­tion isn’t say­ing “we set the thresh­old for the num­ber of ads to run too high” but “we were able to use our long-term value mea­sure­ments to bet­ter figure out which ads not to run”.

Which is just another way of saying that before then they hadn't used their long-term value measurements to figure out what threshold of ads to run before. Whether 2015 or 2013, this is damning. (As are, of course, the other ones I collate, with the exception of Mozilla who don't dare make an explosive move like shipping adblockers installed by default, so the VoI to them is minimal.)

The result which would have been exculpatory is if they said, "we ran an extra-special long-term experiment to check we weren't fucking up anything, and it turns out that, thanks to all our earlier long-term experiments dating back many years which were run on a regular basis as a matter of course, we had already gotten it about right! Phew! We don't need to worry about it after all. Turns out we hadn't A/B-tested our way into a user-hostile design by using wrong or short-sighted metrics. Boy it sure would be bad if we had designed things so badly that simply reducing ads could increase revenue so much." But that is not what they said.

Comment by gwern on Experiments and Consent · 2019-11-11T22:07:25.878Z · score: 19 (5 votes) · LW · GW

And, as that paper inadvertently demonstrates (among others, including my own A/B testing), most companies manage to not run any of those long-term experiments and do things like overload ads to get short-term revenue boosts at the cost of both user happiness and their own long-term bottom line.

That includes Google: note that at the end of a paper published in 2015, for a company which has been around for a while in the online ad business, let us say, they are shocked to realize they are running way too many ads and can boost revenue by cutting ad load.

Ads are the core of Google's business and the core of all A/B testing as practiced. Ads are the first, second, third, and last thing any online business will A/B test, and if there's time left over, maybe something else will get tested. If even Google can fuck that up for so long so badly, what else are they fucking up UI-wise? A fortiori, what else is everyone else online fucking up even worse?

Comment by gwern on Pieces of time · 2019-11-11T18:53:40.635Z · score: 23 (9 votes) · LW · GW

One of the unexpected side-effects I noticed while doing Uberman polyphasic sleep in my various failed attempts way back in 2009 or so was an unpleasant sensation of being unmoored in time: with a lot of little naps wrapping around the clock, there were no clear 'start' or 'end' times, just one day sliding into another. (I get a similar feeling, at a much lower level, when I travel in the Midwest.) The chronic tiredness and mental dullness from the polyphasic sleep didn't help either.

Comment by gwern on Recent updates to gwern.net (2016-2017) · 2019-11-10T19:46:52.490Z · score: 3 (1 votes) · LW · GW

Secondly, if your interpretation were his intended one, he could have done any number of things to suggest it!

He did do any number of things to suggest it!

Nor do any of his out-of-universe quotes indicate he misunderstands. For example, just recently the topic of time travel came up on Hsu's podcast and Chiang says

...the first Terminator film does posit a fixed timeline. And you know, this is something I'm interested in, and yeah, there's a sense in which "What's Expected Of Us" falls into this category, also the story "The Merchant and the Alchemist's Gate" falls into this category, and there's even a sense in which for my first collection, "Story of Your Life", falls in this category.

Actually being able to see the future, in terms of information flowing backwards, in a self-consistent timeline is what is "What's Expected Of Us" considers; and "The Merchant and the Alchemist's Gate" uses physical movement backwards. What then, makes "Story of Your Life" not in the same category as either of those (especially the former) and in fact, doing something so different that it has to be qualified as the very vague 'even a sense in which'? (Because in "Story", the 'time travel' is pseudo time travel, involving neither information nor matter moving backwards in time, and is purely a psychological perspective.)

If your interpretation were Chiang's, he would have to be intentionally misdirecting the audience to a degree you only see from authors like Nabokov, and not leaving any clue except for flawed science, which is common enough for dramatic license in science fiction that it really can't count as a clue. I doubt Chiang is doing that.

I don't mind comparing Chiang with a writer like Nabokov. Nabokov is like Chiang in some ways - for example, they are both very interested in science (eg Nabokov's contributions to lepidopterology).

It's just a more boring story the way you see it!

I strongly disagree. Making Louise some sort of 'Cassandra' with handwavy woo quantum SF is thoroughly boring. The psychological version is much more interesting and far more worthy of 'speculative fiction' and Chiang's style of worldbuilding.

Comment by gwern on For the metaphors · 2019-11-10T00:38:25.509Z · score: 7 (3 votes) · LW · GW

Wittgenstein has another similar metaphor (Zettel, pg 934):

447. Disquiet in philosophy might be said to arise from looking at philosophy wrongly, seeing it wrong, namely as if it were divided into (infinite) longitudinal strips instead of into (finite) cross strips. This inversion in our conception produces the greatest difficulty. So we try as it were to grasp the unlimited strips and complain that it cannot be done piecemeal. To be sure it cannot, if by a piece one means an infinite longitudinal strip. But it may well be done, if one means a cross-strip.

--But in that case we never get to the end of our work!--Of course not, for it has no end. (We want to replace wild conjectures and explanations by quiet weighing of linguistic facts.)

Comment by gwern on Building Intuitions On Non-Empirical Arguments In Science · 2019-11-09T02:58:59.246Z · score: 7 (3 votes) · LW · GW

"There is no view from nowhere." Your mind was created already in motion and thinks, whether you want it to or not, and whatever ontological assumptions it may start with, it has pragmatically already started with them years before you ever worried about such questions. Your Neurathian raft has already been replaced many times over on the basis of decisions and outcomes.

Comment by gwern on Normative reductionism · 2019-11-05T20:39:19.227Z · score: 5 (2 votes) · LW · GW

Sounds like a Markov property.

Comment by gwern on [Question] When Do Unlikely Events Should Be Questioned? · 2019-11-04T20:22:06.681Z · score: 4 (2 votes) · LW · GW

I don't really know. The likelihood of 'generating an amusing coincidence you can post on social media' is clearly quite high: your 1/160,000 merely examines one kind of amusement, and so obviously is merely an extremely loose lower bound. The more kinds of coincidences you enumerate, the bigger the total likelihood becomes, especially considering that people may be motivated to manufacture stories. Countless examples (but here's a fun recent example on confabulating stories for spurious candidate-gene hits). The process is so heterogeneous and differs so much by area (be much more skeptical of hate crime reports than rolling nat 20s), that I don't think there's really any general approach other than to define a reference class, collect a sample, factcheck, and see how many turn out to be genuine... A lot of SSC posts go into the trouble we have with things like this, such as the 'lizardman constant' or rape accusation statistics.

Personally, considering how many rounds there are in any D&D game, how often one does a check, how many players running games there are constantly, how many people you know within 1 or 2 hops on social media, a lower bound of 1/160,000 for a neutral event is already more than frequent enough for me to not be all that skeptical; as Littlewood notes of his own examples, many involving gambling, on a national basis, such things happen frequently.

Comment by gwern on [Question] When Do Unlikely Events Should Be Questioned? · 2019-11-03T18:58:57.808Z · score: 13 (3 votes) · LW · GW

I don't believe there is any such estimate because it is fundamentally derivative of human psychology and numerology and culture. Why is 168 a remarkable number but 167 is not? Because of an accident of Chinese telephones. And so on. There is no formula for those. Look at Littlewood's examples or Diaconis & Mosteller 1989. These things do happen.

And you can expand the space of possibilities even more. What if the same person gets 4 dice in a row within a turn? Across 4 turns? What if the first player gets 1 dice, then the next player gets the same dice, and so on? Would not all of those be remarkable? And note that it would be incorrect to do 'p^4' because you are looking at a sliding window over an indefinitely long series of rolls: anywhere in that could be the start of a run of good luck, every roll offers the potential to start a run.

Comment by gwern on Rationality Quotes: July 2010 · 2019-11-01T22:13:51.929Z · score: 8 (3 votes) · LW · GW

I can't find any source for this, so it may be apocryphal.