"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z · score: 14 (4 votes)
August 2019 newsletter (popups.js demo) 2019-09-01T17:52:01.011Z · score: 12 (4 votes)
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z · score: 29 (9 votes)
July 2019 newsletter 2019-08-01T16:19:59.893Z · score: 24 (5 votes)
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z · score: 49 (12 votes)
June 2019 newsletter 2019-07-01T14:35:49.507Z · score: 30 (5 votes)
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z · score: 27 (7 votes)
On Having Enough Socks 2019-06-13T15:15:21.946Z · score: 21 (6 votes)
May newsletter 2019-06-01T17:25:11.740Z · score: 17 (5 votes)
"One Man's Modus Ponens Is Another Man's Modus Tollens" 2019-05-17T22:03:59.458Z · score: 34 (5 votes)
April 2019 newsletter 2019-05-01T14:43:18.952Z · score: 11 (2 votes)
Recent updates to (2017–2019) 2019-04-28T20:18:27.083Z · score: 36 (8 votes)
"Everything is Correlated": An Anthology of the Psychology Debate 2019-04-27T13:48:05.240Z · score: 49 (7 votes)
March 2019 newsletter 2019-04-02T14:17:38.032Z · score: 19 (3 votes)
February newsletter 2019-03-02T22:42:09.490Z · score: 13 (3 votes)
'This Waifu Does Not Exist': 100,000 StyleGAN & GPT-2 samples 2019-03-01T04:29:16.529Z · score: 39 (12 votes)
January 2019 newsletter 2019-02-04T15:53:42.553Z · score: 15 (5 votes)
"Forecasting Transformative AI: An Expert Survey", Gruetzemacher et al 2019 2019-01-27T02:34:57.214Z · score: 17 (8 votes)
"AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] 2019-01-24T20:49:01.350Z · score: 62 (23 votes)
Visualizing the power of multiple step selection processes in JS: Galton's bean machine 2019-01-12T17:58:34.584Z · score: 27 (8 votes)
Littlewood's Law and the Global Media 2019-01-12T17:46:09.753Z · score: 37 (8 votes)
Evolution as Backstop for Reinforcement Learning: multi-level paradigms 2019-01-12T17:45:35.485Z · score: 18 (4 votes)
December newsletter 2019-01-02T15:13:02.771Z · score: 20 (4 votes)
Internet Search Tips: how I use Google/Google Scholar/Libgen 2018-12-12T14:50:30.970Z · score: 54 (13 votes)
November 2018 newsletter 2018-12-01T13:57:00.661Z · score: 35 (8 votes)
October links 2018-11-01T01:11:28.763Z · score: 31 (8 votes)
Whole Brain Emulation & DL: imitation learning for faster AGI? 2018-10-22T15:07:54.585Z · score: 15 (5 votes)
New /r/gwern subreddit for link-sharing 2018-10-17T22:49:36.252Z · score: 45 (13 votes)
September links 2018-10-08T21:52:10.642Z · score: 18 (6 votes)
Genomic Prediction is now offering embryo selection 2018-10-07T21:27:54.071Z · score: 39 (14 votes)
August links 2018-09-25T15:57:20.808Z · score: 18 (5 votes)
July newsletter 2018-08-02T13:42:16.534Z · score: 24 (8 votes)
June newsletter 2018-07-04T22:59:00.205Z · score: 36 (8 votes)
May newsletter 2018-06-01T14:47:19.835Z · score: 73 (14 votes)
$5m cryptocurrency donation to Alcor by Brad Armstrong in memory of LWer Hal Finney 2018-05-17T20:31:07.942Z · score: 48 (12 votes)
Tech economics pattern: "Commoditize Your Complement" 2018-05-10T18:54:42.191Z · score: 97 (27 votes)
April links 2018-05-10T18:53:48.970Z · score: 20 (6 votes)
March link roundup 2018-04-20T19:09:29.785Z · score: 27 (6 votes)
Recent updates to (2016-2017) 2017-10-20T02:11:07.808Z · score: 7 (7 votes)
The NN/tank Story Probably Never Happened 2017-10-20T01:41:06.291Z · score: 2 (2 votes)
Regulatory lags for New Technology [2013 notes] 2017-05-31T01:27:52.046Z · score: 5 (5 votes)
"AIXIjs: A Software Demo for General Reinforcement Learning", Aslanides 2017 2017-05-29T21:09:53.566Z · score: 1 (3 votes)
Keeping up with deep reinforcement learning research: /r/reinforcementlearning 2017-05-16T19:12:04.201Z · score: 3 (4 votes)
"The unrecognised simplicities of effective action #2: 'Systems engineering’ and 'systems management' - ideas from the Apollo programme for a 'systems politics'", Cummings 2017 2017-02-17T00:59:04.256Z · score: 9 (8 votes)
Decision Theory subreddit 2017-02-07T18:42:55.277Z · score: 6 (7 votes)
Rationality Heuristic for Bias Detection: Updating Towards the Net Weight of Evidence 2016-11-17T02:51:19.316Z · score: 10 (11 votes)
Recent updates to (2015-2016) 2016-08-26T19:22:02.157Z · score: 27 (29 votes)
The Brain Preservation Foundation's Small Mammalian Brain Prize won 2016-02-09T21:02:02.585Z · score: 43 (45 votes)
Recent updates to (2014-2015) 2015-11-02T00:06:11.241Z · score: 21 (22 votes)
[Link] 2015 modafinil user survey 2015-09-26T17:28:17.324Z · score: 9 (10 votes)


Comment by gwern on "AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 · 2019-09-10T23:10:16.391Z · score: 3 (1 votes) · LW · GW

I'm not quite sure what you mean. If you want other manifests for a more evolutionary or meta-learning approach, DM has which lays out a bigger proposal around PBT and other things they've been exploring, if not as all-in on evolution as Uber AI has been for years now.

Comment by gwern on An Educational Singularity · 2019-09-09T02:45:36.747Z · score: 5 (2 votes) · LW · GW

Caplan is correct here. There's no 'far transfer' of the sort which might even slightly resemble 'get a 5% discount on all future fields you study'. (Not that we see anyone who exhibits such an 'educational singularity' in practice, anyway.) At best there might be a sort of meta-study-skill which gives a one-off 'far transfer' effect, like learning how to use search engines or spaced repetition, but it's quickly exhausted and of course just one doesn't give any singularity-esque effect.

A more plausible model would be one with pure near-transfer: every field has a few adjacent fields which give a say 5% near-transfer. So one could learn physics/chemistry/biology, for example, in 2.9x the time of 3 individuals learning the 3 fields separately at 3x the time.

Comment by gwern on Concrete experiments in inner alignment · 2019-09-07T02:54:01.347Z · score: 10 (5 votes) · LW · GW

What do you think of using AIXI.js as a testbed like does?

Comment by gwern on August 2019 newsletter (popups.js demo) · 2019-09-04T23:49:13.686Z · score: 3 (1 votes) · LW · GW

The ad is an experiment:

The alignment is wrong, yes. I got the CSS id wrong when I set up the second one, I guess. Fixed.

Comment by gwern on August 2019 newsletter (popups.js demo) · 2019-09-01T22:45:37.957Z · score: 5 (2 votes) · LW · GW

Obormot's working on it.

Comment by gwern on Link: That Time a Guy Tried to Build a Utopia for Mice and it all Went to Hell · 2019-08-12T21:29:01.453Z · score: 25 (7 votes) · LW · GW

I've summarized my problems with Mouse Utopia:

Comment by gwern on Epistemic Spot Check: The Role of Deliberate Practice in the Acquisition of Expert Performance · 2019-08-09T20:21:16.588Z · score: 8 (4 votes) · LW · GW

Fundamentals of Skill, Welford (1968)

I've uploaded a scan if you want to look.

Comment by gwern on Is there a user's manual to using the internet more efficiently? · 2019-08-05T16:30:58.936Z · score: 3 (1 votes) · LW · GW

Is Net Smart very practical? The introduction sounds more theoretical and generic, and it's a good 7 years old now. (I noticed when I saw references in that link to long-defunct websites like CureTogether.)

Comment by gwern on Is there a user's manual to using the internet more efficiently? · 2019-08-05T16:29:00.706Z · score: 6 (4 votes) · LW · GW

In theory? You just generate a few random samples with the current text as the prefix and display them. In practice, there's already tools to do this: Talk to Transformer does autocomplete. Even better, IMO, is Deep TabNine for programming languages, trained off Github.

Comment by gwern on What supplements do you use? · 2019-08-01T22:11:30.421Z · score: 10 (4 votes) · LW · GW

The impression I got from asking people like James about metformin side-effects when I was trying a cost-benefit is that most of it has quick onset, like the gastrointestinal distress, and if you can't fix it by modifying the dose, you can simply discontinue it ie you have option value. This would reduce the EV a little but is not that big a deal. After all, metformin is one of the most (the most?) widely used chronic prescription drugs in the world & regarded as very safe, so the side effects can't be that bad, one would think.

The question of redundancy with other interventions is a more concerning one. Not all the metformin papers are positive in this regard. Here's a small paper suggesting that metformin blunts the benefits of exercise, and "Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug", Wu et al 2017, suggests part of metformin's benefits is by changing the microbiome, but of course, exercise or diet or lifestyle changes might also be changing the microbiome in precisely the same way... For diabetics, who have done what little they are able or willing to do, that presumably is not happening enough to cure their diabetes and so the average metformin effect is still worthwhile, but for those more rigorous about longevity, who knows?

I have similar concerns about baby aspirin and everything postulated to involve inflammation, and perhaps also the senolytics as well: they often seem to be hypothesized to be acting through similar pathways (eg inflammation causes/is caused by senescent cells, some say, but if exercise kills senescent cells by inducing autophagy, doesn't that imply it'd be at least partially redundant with taking a senolytic drug?). I'm not sure what could be done here except to directly test the potential for interactions in factorial experiments.

Comment by gwern on How often are new ideas discovered in old papers? · 2019-07-26T01:29:25.096Z · score: 17 (10 votes) · LW · GW

Citations can be used as the metadata. One of the closest corresponding things in cliometrics are 'sleeping beauty' papers, which instead of the usual gradual decline in citation rate, suddenly see a big uptick many years afterwards. The recent 'big teams vs small teams' paper discussed sleeping beauty papers a little: You could also take multiple discovery as quantifying repetition, since one of the most common ways for a multiple to happen is for it to happen in a different field where it is also important/useful but they haven't heard of the original discovery in the first field.

There's a nice version of this with Ed Boyden on how old papers helped lead to the hot new 'expansion microscopy' thing (funded, incidentally, by OpenPhil):

Comment by gwern on The Self-Unaware AI Oracle · 2019-07-24T16:24:40.525Z · score: 5 (2 votes) · LW · GW

GPUs aren't deterministic.

Comment by gwern on RAISE AI Safety prerequisites map entirely in one post · 2019-07-18T00:36:05.160Z · score: 21 (7 votes) · LW · GW

As a historical fact, you certainly can invent selective breeding without knowing anything we would consider true: consider Robert Bakewell and the wildly wrong theories of heredity current when he invented line breeding and thus demonstrated that breeds could be created by artificial selection. (It's unclear what Bakewell and/or his father thought genetics was, but at least in practice, he seems to have acted similarly to modern breeding practices in selecting equally on mothers/fathers, taking careful measurements and taking into account offspring performance, preserving samples for long-term comparison, and improving the environment as much as possible to allow maximum potential to be reached.) More broadly, humans had no idea what they were doing when they were domesticated everything; if Richard Dawkins is to be trusted, it seems that the folk genetics belief was that traits are not inherited and everything regressed to an environmental mean, and so one might as well eat one's best plants/animals since it'll make no difference. And even more broadly, evolution has no idea what 'it' is doing for anything, of course.

The problem is, as Eliezer always pointed out, that selection is extremely slow and inefficient compared to design - the stupidest possible optimization process that'll still work within the lifetime of Earth - and comes with zero guarantees of any kind. Genetic drift might push harmful variants up, environmental fluctuations might extinguish lineages, reproductively fit changes which Goodhart the fitness function might spread, nothing stops a 'treacherous turn', evolved systems tend to have minimal modularity and are incomprehensible, evolution will tend to build in instrumental drives which are extremely dangerous if there is any alignment problem (which there will be), sexual selection can drive a species extinct, evolved replicators can be hijacked by replicators on higher levels like memetics, any effective AGI design process will need to learn inner optimizers/mesa-optimizers which will themselves be unpredictable and only weakly constrained by selection, and so on. If there's one thing that evolutionary computing teaches, it's that these are treacherous little buggers indeed (Lehman et al 2018). The optimization process gives you what you ask for, not what you wanted.

So, you probably can 'evolve' an AGI, given sufficient computing power. Indeed, considering how many things in DL or DRL right now take the form of 'we tried a whole bunch of things and X is what worked' (note that a lot of papers are misleading about how many things they tried, and tell little theoretical stories about why their final X worked, which are purely post hoc) and only much later do any theoreticians manage to explain why it (might) work, arguably that's how AI is proceeding right now. Things like doing population-based training for AlphaStar or NAS to invent EfficientNet are just conceding the obvious and replacing 'grad student descent' with gradient descent.

The problem is, we won't understand why they work, won't have any guarantees that they will be Friendly, and they almost certainly will have serious blindspots/flaws (like adversarial examples or AlphaGo's 'delusions' or how OA5/AlphaStar fell apart when they began losing despite playing apparently at pro level before). NNs don't know what they don't know, and neither do we.

Nor are these flaws easy to fix with just some more tinkering. Much like computer security, you can't simply patch your way around all the problems with software written in C (as several decades of endless CVEs has taught us); you need to throw it out and start with formal methods to make errors like buffer overflows impossible. Adversarial examples, for instance: I recall that one conference had something like 5 adversarial defenses, all defined heuristically without proof of efficacy, and all of them were broken between the time of submission and the actual conference. Or AlphaGo's delusions couldn't be fixed despite quite elaborate methods being used to produce Master (which at least had better ELO) until they switched to the rather different architecture of AlphaZero. Neither OA5 nor AlphaStar has been convincingly fixed that I know of, they simply got better to the point where human players couldn't exploit them without a lot of practice to find reproducible ways of triggering blindspots.

So, that's why you want all the math. So you can come up with provably Friendly architectures without hidden flaws which simply haven't been triggered yet.

Comment by gwern on Against NHST · 2019-07-16T17:15:46.008Z · score: 5 (2 votes) · LW · GW

"From Statistical Significance To Effect Estimation: Statistical Reform In Psychology, Medicine And Ecology", Fidler 2005; a broad but still in depth thesis on the history of NHST and attempts to reform it.

Comment by gwern on How Should We Critique Research? A Decision Perspective · 2019-07-15T22:12:19.701Z · score: 5 (3 votes) · LW · GW

Does the abstract not work for you?

Comment by gwern on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-07-08T17:06:16.576Z · score: 7 (3 votes) · LW · GW

Avalon is another example, with better current performance.

Comment by gwern on Musings on Cumulative Cultural Evolution and AI · 2019-07-07T20:50:17.486Z · score: 4 (2 votes) · LW · GW

Tomasello’s work stresses mindreading, in particular the ability for humans to carry joint attention [link].

Link seems to be missing.

Comment by gwern on Is AlphaZero any good without the tree search? · 2019-07-02T15:12:45.691Z · score: 3 (1 votes) · LW · GW

I didn't say anything about chess or shogi because I don't recall any ablation for A0, I just remember the one in the AG0 paper for Go. The AG0 is definitely at or close to professional level and better than 'good amateur'. And I would consider a non-distributed PUCT with no rollouts or other refinements to be a 'simple tree search': it doesn't do any rollouts, and the depth is seriously limited by running on only a single machine w/4 TPUs with a few seconds for search: as the AG0 paper puts it, "Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte-Carlo rollouts...we chose to use the simplest possible search algorithm".

Comment by gwern on Is AlphaZero any good without the tree search? · 2019-07-01T13:25:35.192Z · score: 8 (4 votes) · LW · GW

Sure. But the final player does not use MCTS, and it's interesting that it's not necessary then. (It's even more interesting that the way they discovered they didn't need MCTS is by hyperparameter optimization, but that's a different discussion.)

Comment by gwern on Is AlphaZero any good without the tree search? · 2019-07-01T03:17:02.315Z · score: 7 (4 votes) · LW · GW

You want the 'AlphaGo Zero' paper, not the 'AlphaZero' papers, which merely simplify it and reuse it in other domains; the AGZ paper is more informative than the AZ papers. See Figure 6b, and pg25 for the tree search

Figure 6b shows the performance of each program on an Elo scale. The raw neural network, without using any lookahead, achieved an Elo rating of 3,055. AlphaGo Zero achieved a rating of 5,185, compared to 4,858 for AlphaGo Master, 3,739 for AlphaGo Lee and 3,144 for AlphaGo Fan.

So the raw NN, a single pass and selecting the max, is 3k ELO, about 100 ELO under AlphaGo Fan, which soundly defeated a human professional (Fan Hui). I'm not sure whether -100 ELO is enough to demote it to amateur status, but it's at least clearly not that far from professional in the worst case.

Comment by gwern on Is AlphaZero any good without the tree search? · 2019-06-30T20:07:31.342Z · score: 14 (7 votes) · LW · GW

The paper includes the ELO for just the NN. I believe it's professional level but not superhuman, but you should check if you really need to know. However, note that Alphazero's actual play doesn't use MCTS at all, it uses a simple tree search which only descends a few ply.

Comment by gwern on How can we measure creativity? · 2019-06-29T22:46:41.800Z · score: 9 (4 votes) · LW · GW

A good anthology to read is Creativity, ed Vernon 1970 - it's old but it shows you what people were thinking back when Torrance was trying to come up with creativity tests, and the many psychometric criticisms back then which I'm not sure have been convincingly resolved.

Comment by gwern on Only optimize to 95 % · 2019-06-26T02:43:22.044Z · score: 7 (3 votes) · LW · GW

For EU quantilization. But you could apply quantilization to beliefs rather than the utilities. (I don't think it would work as well because I immediately wonder how many different ways belief quantilization breaks the probability axioms and renders any such agent inconsistent & Dutch-bookable, and this illustrates why people generally discuss EU quantilization instead.)

Comment by gwern on Only optimize to 95 % · 2019-06-25T23:21:29.082Z · score: 19 (9 votes) · LW · GW

The proposal is known as 'quantilization' eg

Comment by gwern on Discussion Thread: The AI Does Not Hate You by Tom Chivers · 2019-06-25T14:46:57.089Z · score: 15 (4 votes) · LW · GW

Ghenlezo review:

Comment by gwern on ISO: Automated P-Hacking Detection · 2019-06-16T23:11:37.806Z · score: 8 (4 votes) · LW · GW

Comment by gwern on On Having Enough Socks · 2019-06-14T16:43:15.124Z · score: 3 (1 votes) · LW · GW

Also, 1 sock lost is automatically 1 pair of socks lost since they can only function in pairs.

Only in the worst cast of all-unique pairs. If you buy in batches (as I and a lot of people seem to do), then 1 lost sock is just 1 lost sock until you reach as low as n=2.

Comment by gwern on On Having Enough Socks · 2019-06-13T21:19:35.361Z · score: 6 (3 votes) · LW · GW

In terms of sibling effects, they could be large drains. Imagine a sibling who never bothers to buy their own socks but just unconsciously takes one sock too many once in a while. If there's 2 siblings, now the responsible one must buy twice as much socks as they should (because of the hidden drain). Such people would simply show up as rare-purchasers in my survey, and there are quite a few such people. Ones lost in a dryer may be de facto permanently gone: even if you pull the units out a decade later and find them, do you even want to wear them anymore? And what does one do with a mismatched sock? If its mate doesn't show up in a few months, you might toss it or use it for something else entirely, and then should the mate reappear later, now it's a mismatch as well...

I certainly don't lose 8 pairs of socks a year, but then, I don't spend $200+ a month on groceries either.

Comment by gwern on On Having Enough Socks · 2019-06-13T19:22:11.241Z · score: 6 (3 votes) · LW · GW

Samsung says

There are many practical reasons for sock loss rather than supernatural disappearances. Research interviews found the common causes included items falling behind radiators or under furniture without anyone realising, stray items being added to the wrong coloured wash and becoming separated from its matching sock, not being secured to a washing line securely so they fall off and blow away – or they are simply carelessly paired up

And I think they do get lost. In multi-person households, socks have a tendency to migrate to other people's rooms, flowing along a sock gradient. (I lost a lot of socks to my brother. I know because we labeled them with markers and I'd regularly find them in his drawer.) Sometimes they get physically lost in the dryer. In cluttered households, it's easy for a sock to fall out of the dryer or the basket when you're moving a big load, or fall behind drawers/beds and get lost there. Pet animals can steal them: I've seen ferrets making off with socks to hide in corners (or behind the dryer), and supposedly Siamese cats often have a pica just for socks & woolens. And in some cases, there may be things man was not meant to know.

Personally, I think my sock shortage was due more to them wearing out than actually going missing. I'd get rid of them as necessary, but I then didn't buy any replacements.

Comment by gwern on A Plausible Entropic Decision Procedure for Many Worlds Living, Round 2 · 2019-06-09T23:53:12.327Z · score: 9 (4 votes) · LW · GW

This still doesn't seem to address why one should be risk-averse and prioritize an impoverished survival in as many branches as possible. (Not that I think it does even that, given your examples; by always taking the risky opportunities with a certain probability, wouldn't this drive your total quantum measure down rapidly? You seem to have some sort of min-max principle in mind.)

Nor does it deal with the response to the standard ensemble criticism of expected utility: EU-maximization is entirely consistent with non-greedy-EU-maximization strategies (eg the Kelly criterion) as the total-EU-maximizing strategy if the problem, fully modeled, includes considerations like survival or gambler's ruin (eg in the Kelly coinflip game, the greedy strategy of betting everything each round is one of the worst possible things to do, but EU-maximizing over the entire game does in fact deliver optimal results); however, these do not apply at the quantum level, they only exist at the macro level, and it's unclear why MWI should make any difference.

Comment by gwern on My Childhood Role Model · 2019-06-09T16:56:19.028Z · score: 7 (3 votes) · LW · GW

They are characters in the well-known Vinge SF novel A Fire Upon the Deep.

Comment by gwern on Paternal Formats · 2019-06-09T02:40:39.312Z · score: 28 (14 votes) · LW · GW

This name seems unnecessarily intrinsically prejudicial. Perhaps 'legible' vs 'illegible' would be better, or use McLuhan's 'hot' vs 'cool' mediums.

Comment by gwern on Two labyrinths - where would you rather be? · 2019-06-08T23:53:12.641Z · score: 6 (3 votes) · LW · GW

I've always felt that life is more like "A Solar Labyrinth".

Comment by gwern on Disincentives for participating on LW/AF · 2019-05-18T23:07:57.931Z · score: 5 (2 votes) · LW · GW

A rapporteur?

Comment by gwern on What are good practices for using Google Scholar to research answers to LessWrong Questions? · 2019-05-18T22:26:22.442Z · score: 33 (8 votes) · LW · GW

My search guide might be helpful.

Comment by gwern on "One Man's Modus Ponens Is Another Man's Modus Tollens" · 2019-05-18T19:30:44.180Z · score: 3 (1 votes) · LW · GW

Do you think any of the examples are better termed 'modus delens'?

Comment by gwern on Which scientific discovery was most ahead of its time? · 2019-05-17T22:05:36.555Z · score: 6 (2 votes) · LW · GW

Related: "sleeping beauty" papers.

Comment by gwern on Implications of GPT-2 · 2019-05-10T19:38:01.854Z · score: 4 (2 votes) · LW · GW

In what sense is being able to do addition or subtraction with different numbers, for example, which is what it means to learn addition or subtraction, not 'the exact same problem but with different labels'?

Comment by gwern on Implications of GPT-2 · 2019-05-08T20:45:03.753Z · score: 5 (2 votes) · LW · GW

DeepMind has shown that Transformers trained on natural text descriptions of math problems can solve them at well above random: "Analysing Mathematical Reasoning Abilities of Neural Models", Saxton et al 2019:

Mathematical reasoning---a core ability within human intelligence---presents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis of inferring, learning, and exploiting laws, axioms, and symbol manipulation rules. In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a free-form textual input/output format. The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes. Having described the data generation process and its potential future expansions, we conduct a comprehensive analysis of models from two broad classes of the most powerful sequence-to-sequence architectures and find notable differences in their ability to resolve mathematical problems and generalize their knowledge.

And this sounds like goal post moving:

unless a very similar problem appears in the training data—e.g. the exact same problem but with different labels

Comment by gwern on Recent updates to (2017–2019) · 2019-05-02T00:26:36.221Z · score: 3 (1 votes) · LW · GW

TWDNE has now been upgraded with samples from an additional 2 months of training on bigger faces, which should make them considerably better:

Comment by gwern on An Apology is a Surrender · 2019-05-01T22:08:41.837Z · score: 20 (6 votes) · LW · GW

An apology experiment: "Does Apologizing Work? An Empirical Test of the Conventional Wisdom", Hanania 2015:

This paper presents the results of an experiment where respondents were given two versions of two real-life controversies involving comments made by public figures. Approximately half of the participants read a story that made it appear as if the person had apologized, while the rest were led to believe that the individual stood firm. In the first experiment, involving Rand Paul and his comments on the Civil Rights Act, hearing that he was apologetic did not change whether respondents were less likely to vote for him. When presented with two versions of the controversy surrounding Larry Summers and his comments about women scientists and engineers, however, liberals and females were much more likely to say that he definitely or probably should have faced negative consequences for his statement when presented with his apology.

Comment by gwern on Recent updates to (2017–2019) · 2019-04-29T15:00:22.621Z · score: 3 (1 votes) · LW · GW

I haven't but I should.

Comment by gwern on Recent updates to (2017–2019) · 2019-04-28T23:44:17.896Z · score: 11 (3 votes) · LW · GW

Maybe. I think you would have to check the metadata field for 'finished', because otherwise there's no definitive criteria: I put up the notes weeks in advance, and they usually aren't finished on the 1st of the month. I don't especially mind manual submission since I have to crosspost to Twitter/Reddit/#lesswrong/TinyLetter anyway.

Comment by gwern on Hull: An alternative to shell that I'll never have time to implement · 2019-04-28T18:30:37.044Z · score: 6 (4 votes) · LW · GW

Oleg's zipper-based 'shell' and 'filesystem' has some similar properties:

Comment by gwern on "Everything is Correlated": An Anthology of the Psychology Debate · 2019-04-27T13:48:58.167Z · score: 8 (4 votes) · LW · GW

(Sort of a very long delayed followup to , tracking down one specific strand of the debate.)

Comment by gwern on Open Thread April 2019 · 2019-04-09T03:05:41.858Z · score: 5 (2 votes) · LW · GW

Those pictures are eight years old, and those particular masks aren’t listed on the store’s website ( )

Is there a reason to not just email & ask (other than depression)?

Comment by gwern on Alignment Newsletter #52 · 2019-04-06T01:54:29.719Z · score: 9 (4 votes) · LW · GW

Looking at the description of that Pavlov algorithm, it bears more than a passing resemblance to REINFORCE or evolutionary methods of training NNs, except with the neurons relabeled 'agents'.

Comment by gwern on Aumann Agreement by Combat · 2019-04-05T15:53:37.023Z · score: 8 (4 votes) · LW · GW

One can’t link to sections within a PDF,

Yes you can. #page=N. That's how I linked to the papers I liked.

Comment by gwern on March 2019 newsletter · 2019-04-04T01:05:05.619Z · score: 6 (3 votes) · LW · GW

Which part? There have cumulatively been a lot of changes.

Comment by gwern on March 2019 newsletter · 2019-04-02T20:25:15.422Z · score: 9 (4 votes) · LW · GW

I am excited and terrified of eyetracking for foveated rendering in VR for precisely those reasons: it will be both awesome & awful and I don't know how it'll net out. (All the more reason to keep paying for VR games, I guess, to help ensure that the user is the customer rather than the product...)