Posts

Yet another Simpson's Paradox Post 2019-12-23T14:20:09.309Z · score: 8 (3 votes)
Billion-scale semi-supervised learning for state-of-the-art image and video classification 2019-10-19T15:10:17.267Z · score: 5 (2 votes)
What are your strategies for avoiding micro-mistakes? 2019-10-04T18:42:48.777Z · score: 18 (9 votes)
What are effective strategies for mitigating the impact of acute sleep deprivation on cognition? 2019-03-31T18:31:29.866Z · score: 26 (11 votes)
So you want to be a wizard 2019-02-15T15:43:48.274Z · score: 16 (3 votes)
How do we identify bottlenecks to scientific and technological progress? 2018-12-31T20:21:38.348Z · score: 31 (9 votes)
Babble, Learning, and the Typical Mind Fallacy 2018-12-16T16:51:53.827Z · score: 5 (3 votes)
NaiveTortoise's Short Form Feed 2018-08-11T18:33:15.983Z · score: 14 (3 votes)
The Case Against Education: Why Do Employers Tolerate It? 2018-06-10T23:28:48.449Z · score: 17 (5 votes)

Comments

Comment by an1lam on A Simple Introduction to Neural Networks · 2020-02-11T17:50:13.960Z · score: 1 (1 votes) · LW · GW

Another possible reason for using squared error is that from a stats perspective, the Bayes (optimal) estimator of the squared error, will be the mean of the distribution, whereas the optimal estimator of the MAE will be the median. It's not clear to me that the mean's what you want but maybe?

Comment by an1lam on Algorithms vs Compute · 2020-02-09T16:31:41.855Z · score: 3 (2 votes) · LW · GW

Weak evidence for compute: apparently the original TD-gammon code from 1992 performed quite well when run with modern amount of compute (source).

Comment by an1lam on NaiveTortoise's Short Form Feed · 2020-01-27T15:14:03.598Z · score: 1 (1 votes) · LW · GW

I forgot to include the disclaimer besides statistical independence tests, which can invalidate graphs but are difficult in practice.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2020-01-26T18:56:09.057Z · score: 5 (4 votes) · LW · GW

It seems like (unless I just haven't discovered it yet) there's a sore need for a framework for causal model comparison, analogous to Bayesian model comparison. If you read Pearl (and his students), they rightfully point out that you can't get causal claims without causal assumptions but don't talk much about how you actually formulate the causal model in the first place ("domain knowledge"). As a result, if you look at the literature, researchers mostly seem to use a small set of causal models that may or may not describe phenomena, e.g. the classic "instrumental variable" graph, for inference.

I view this as analogous to selecting a prior in applied Bayesian modeling. However, there there's a nice set of tools for comparing how likely different models are, whereas I'm not aware of any such thing in the causal inference world. There's something called "sensitivity analysis" but that's about how much deviation from your assumptions affects your conclusions.

Comment by an1lam on Is there a simple parameter that controls human working memory capacity, which has been set tragically low? · 2020-01-07T01:45:37.474Z · score: 1 (1 votes) · LW · GW

Totally agree!

Comment by an1lam on Is there a simple parameter that controls human working memory capacity, which has been set tragically low? · 2020-01-06T15:50:14.570Z · score: 1 (1 votes) · LW · GW

So, I managed to find what I think is the original study from which this video came and I'm skeptical that it's strong evidence for chimps' working memory exceeding humans'. It does seem like decent evidence for chimp response time being better.

First of all, they picked the best two chimps to compare against humans:

We compared Ai, the best mother performer, Ayumu, the best young performer, and human subjects (n = 9, all university students) in this task.

Second, if I'm reading it correctly, the chimps got more trials in the experiment and had practiced a very similar activity previously.

Each chimpanzee received 10 sessions and each of 9 humans received a single test session.

A ‘masking task’ to test memory was introduced at around the time when the young became five years old. In this task, after touching the first numeral, all other numerals were replaced by white squares. The subject had to remember which numeral appeared in which location, and then touch them based on the knowledge of numerical sequence. All five naïve chimpanzees mastered the masking task, just like Ai.

In other words, the chimps had been practicing a less time-restricted form of the memory activity for years (?) before participating in the experiment. While it's possible that practice wouldn't have improved the human scores. that goes against my prior based on the fact that humans can dramatically improve at similar activities such as n-back.

Third, the humans actually did better than one of the two (best performing) monkeys, Ai, and did nearly as well as the best monkey, Ayumu, on the longest hold duration task.

Image alt-text

From my perspective, this provides evidence that slower human response time may have played a role in the worse performance in the shorter hold tasks. The authors even mention that the shortest duration is "is close to the frequency of occurrence of human saccadic eye movement", meaning that a human would only get a flash of the image before it disappeared.

Comment by an1lam on Book Review: Design Principles of Biological Circuits · 2020-01-06T04:05:21.468Z · score: 19 (3 votes) · LW · GW

Excited to see that the author of this book, Uri Alon, just tweeted about this review.

A beautiful and insightful review of Introduction to Systems Biology, 2nd edition, emphasizing the routes to simplicity in biological systems:
https://lesswrong.com/posts/bNXdnRTpSXk9p4zmi/book-review-design-principles-of-biological-circuits.

Comment by an1lam on Homeostasis and “Root Causes” in Aging · 2020-01-05T19:39:28.365Z · score: 4 (3 votes) · LW · GW

(Great post!)

It seems like another implication of this model would be that correlated shifts in multiple equilibria on the same timescales provides some evidence of common causes. E.g., DNA damage and cell turnover rates changing at the same time would give some evidence in favor of them being regulated by the same mechanism.

Comment by an1lam on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-03T22:45:34.299Z · score: 3 (2 votes) · LW · GW

I've seen the "ML gets deployed carelessly" narrative pop up on LW a bunch, and while it does seem accurate in many cases, I wanted to note that there are counter-examples. The most prominent counter-example I'm aware of is the incredibly cautious approach DeepMind/Google took when designing the ML system that cools Google's datacenters.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-12-25T02:00:22.621Z · score: 2 (2 votes) · LW · GW

Thanks, that helps contextualize.

Comment by an1lam on Under what circumstances is "don't look at existing research" good advice? · 2019-12-24T16:07:57.573Z · score: 3 (3 votes) · LW · GW

I like this answer, but do question the point about Feynman's gift being mainly traditional rationality.

One of the other comments here recommends against this unless you are a Feynman-level genius, but I think the causality is backwards on this. Feynman’s gift was traditional rationality, something which comes through very clearly in his writing. He tells these anecdotes in order to teach people how to think, and IMHO his thoughts on thinking are worth paying attention to.

I agree that Feynman portrays it that way in his memoirs, but accounts from other physicists and mathematicians paint a different picture. Here are a few example quotes as evidence that Feynman's gifts also involved quite a bit of "magic" (i.e. skills he developed that one would struggle to learn from observation or even imitation).

First, we have Mark Kac, who worked with Feynman and was no schlub himself, describing two kinds of geniuses of which Feynman was his canonical example of the "magician" type (source):

In science, as well as in other fields of human endeavor, there are two kinds of geniuses: the “ordinary” and the “magicians.” An ordinary genius is a [person] that you and I would be just as good as, if we were only many times better. There is no mystery as to how his mind works. Once we understand what he has done, we feel certain that we, too, could have done it. It is different with the magicians. They are, to use mathematical jargon, in the orthogonal complement of where we are and the working of their minds is for all intents and purposes incomprehensible. Even after we understand what they have done, the process by which they have done it is completely dark. They seldom, if ever, have students because they cannot be emulated and it must be terribly frustrating for a brilliant young mind to cope with the mysterious ways in which the magician’s mind works... Feynman is a magician of the highest caliber.

(Note that, according to this source, Feynman did actually have 35 students, many of whom were quite accomplished themselves. so the point about seldom having students doesn't totally hold for him.)

Sidney Coleman, also no slouch, shared similar sentiments (source):

The generation coming up behind him, with the advantage of hindsight, still found nothing predictable in the paths of his thinking. If anything he seemed perversely and dangerously bent on disregarding standard methods. "I think if he had not been so quick people would have treated him as a brilliant quasi crank, because he did spend a substantial amount of time going down what later turned out to be dead ends," said Sidney Coleman, a theorist who first knew Feynman at Caltech in the 50's.

"There are lots of people who are too original for their own good, and had Feynman not been as smart as he was, I think he would have been too original for his own good," Coleman continued. "There was always an element of showboating in his character. He was like the guy that climbs Mont Blanc barefoot just to show that it can be done."

Feynman continued to refuse to read the current literature, and he chided graduate students who would begin their work on a problem in the normal way, by checking what had already been done. That way, he told them, they would give up chances to find something original.

"I suspect that Einstein had some of the same character," Coleman said. "I'm sure Dick thought of that as a virtue, as noble. I don't think it's so. I think it's kidding yourself. Those other guys are not all a collection of yo-yos. Sometimes it would be better to take the recent machinery they have built and not try to rebuild it, like reinventing the wheel. Dick could get away with a lot because he was so goddamn smart. He really could climb Mont Blanc barefoot."

Coleman chose not to study with Feynman directly. Watching Feynman work, he said, was like going to the Chinese opera. "When he was doing work he was doing it in a way that was just -- absolutely out of the grasp of understanding. You didn't know where it was going, where it had gone so far, where to push it, what was the next step. With Dick the next step would somehow come out of -- divine revelation."

In particular, note the last point about "divine revelation".

Admittedly, these are frustratingly mystical. Stephen Wolfram describes it less mystically and also sheds light on why his descriptions of his discoveries always made them seem so obvious after the fact even though they weren't (source:

I always found it incredible. He would start with some problem, and fill up pages with calculations. And at the end of it, he would actually get the right answer! But he usually wasn't satisfied with that. Once he'd gotten the answer, he'd go back and try to figure out why it was obvious. And often he'd come up with one of those classic Feynman straightforward-sounding explanations. And he'd never tell people about all the calculations behind it. Sometimes it was kind of a game for him: having people be flabbergasted by his seemingly instant physical intuition, not knowing that really it was based on some long, hard calculation he'd done.

He always had a fantastic formal intuition about the innards of his calculations. Knowing what kind of result some integral should have, whether some special case should matter, and so on. And he was always trying to sharpen his intuition.

Now, it's possible that all this was really typical rationality, but I'm skeptical given that even other all star physicists and mathematicians found it so hard to understand / replicate.

All that said, I think Feynman's great as is deriving stuff from scratch. I just think people often overestimate how much of Feynman's Feynman-ness came from good old fashioned rationality.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-12-24T14:37:17.555Z · score: 3 (5 votes) · LW · GW

(Note: responded quickly before removing. I've since edited this comment now that I have more time. Also I'm not the person who downvoted your post.)

I definitely did not intend to cause anyone or their family danger (or harassment, etc.), so I've removed the post.

Mostly in the selfish interest of showing that I wasn't being negligent, I did consider this risk before posting. That's why I noted that I have no information beyond what's already public and was taking into account that since I heard this speculation on a podcast which involved one relatively prominent cryptocurrency person (I won't say who so as not to publicize it further), it seemed unlikely that my post would add additional noise.

All that said, I still agree that even a small chance of harm is more than enough reason to remove the post. Especially, since:

  1. it seems like you're more involved in the crypto community than I and therefore probably have more context than I do on this topic; and
  2. my own version of integrity includes not doing things that only don't cause bad outcomes because they're obscure (related to my second point above).
Comment by an1lam on Yet another Simpson's Paradox Post · 2019-12-23T19:59:51.897Z · score: 2 (2 votes) · LW · GW

Nice, I'd neither heard nor thought of this framing before. Thanks!

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-12-23T19:38:56.915Z · score: 3 (4 votes) · LW · GW

(Removed.)

Comment by an1lam on What are your strategies for avoiding micro-mistakes? · 2019-12-23T14:37:44.251Z · score: 3 (2 votes) · LW · GW

I've been more consciously (I think it's literally impossible to think "without intuition", but thinking about it as a necessary prerequisite was new for me) thinking about / playing with the recommended approach in this comment since you made it and it's been helpful, especially in helping me notice the difference between questions where I can "read off" the answer vs. ones where I draw a blank.

However, I've also noticed that it's definitely a sort of "hard mode" in the sense that the more I rely on it the more it forces me to develop intuitions about everything I'm learning before I can think about them effectively. To give an example, I've been learning more statistics and there are a bunch of concepts about which I currently have no intuition. E.g., something like the "Gilvenko-Cantelli Theorem". Historically, I would have just filed it away in my head as "thing that is used to prove variant of Hoeffding that involves a supremum" or, honestly, forgotten about it. But since I've been trying to consciously practice developing intuitions, I end up spending a bunch of time thinking about what it's really saying because I no longer trust myself to use it without understanding it.

Now, I suspect many people in particular people from the LW population would have a response along the lines of, "that's good, you're forcing yourself to deeply understand everything you're learning". And that's partly true! On the other hand, I do think there's something to be said for knowing when to / not to spend time deeply grokking things and just using them as tools, and by forcing myself to rely heavily on intuitions, it becomes harder to do that.

Related to that, I'd be interested in hearing how you and other go about developing such intuitions. (I've been compiling my own list.)

Comment by an1lam on Neural networks as non-leaky mathematical abstraction · 2019-12-19T21:01:46.502Z · score: 3 (3 votes) · LW · GW

My read was that it's less an argument for the end-to-end principle and more an argument for modular, composable building blocks of which understanding of internals is not required (not the author though).

(Note that my experience of trying new combinations of deep learning components hasn't really matched this. E.g., I've spent a lot of time and effort trying to get new loss functions to work with various deep learning architectures, often with very limited success and often could not get away with not understanding what was going on "under the hood".)

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-12-13T21:35:30.350Z · score: 2 (2 votes) · LW · GW

Link post for a short post I just published describing my way of understanding Simpson's Paradox.

Comment by an1lam on Bayesian examination · 2019-12-11T21:47:05.057Z · score: 1 (1 votes) · LW · GW

If you really want to try and get traction on this, I'd recommend emailing Andrew Gelman (stats blogger and stats prof at Columbia). He's written previously (I can't seem to find the article unfortunately) about how statisticians should take their own ideas more seriously with respect to education and at the very least I can see him blogging about this.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-12-10T12:45:32.897Z · score: 3 (2 votes) · LW · GW

Weird thought I had based on a tweet about gradient descent in the brain: it seems like one under-explored perspective on computational graphs is the causal one. That is, we can view propagating gradients through the computational graph as assessing the effect of an intervention on some variable on all of a nodes' children.

Reason to think this might be useful:

  • *Maybe* this can act as a different lens for examining NN training?

Reasons why this might not be useful:

  • It's not obvious that it makes sense to think of nodes in an NN (or any differential computational graph) as causally related in the sense we usually talk about in causal inference.
  • A causal interpretation of gradients isn't obvious because they're so local, whereas most causal inference focuses on the impact of more non-trivial interventions. OTOH, there are some NN interpretability techniques that try to solve this, so maybe these have better causal interpretations?
Comment by an1lam on Connectome-Specific Harmonic Waves · 2019-12-05T02:29:21.909Z · score: 3 (3 votes) · LW · GW

FFNNs’ inability to process time series data was a contributing factor to the Uber self-crashing car.

I don't see the evidence for this in the linked post and don't recall seeing this in any of the other few articles/pieces I've read on the crash. Can you point me to the evidence for this?

Comment by an1lam on Paper-Reading for Gears · 2019-12-05T00:10:31.696Z · score: 9 (3 votes) · LW · GW

I wonder how hard it would be to formalize this claim about mediation in terms of the space of causal DAGs. I haven't done the work to try it so I'm mostly spitballing.

Informally, I associate mediation with the front-door criteria in causality. So, the usefulness of mediation should reflect that the front-door criterion is empirically easier to satisfy than the back-door (and other) criteria maybe because real causal chains tend to be narrow but long? Thinking about it a bit more, it's probably more like the min cut of real world causal graphs tend to be relatively small?

Comment by an1lam on [1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Arxiv · 2019-11-21T16:22:24.267Z · score: 1 (1 votes) · LW · GW

In other words, the real world is just a mixture distribution over simulations :).

Comment by an1lam on The Power to Draw Better · 2019-11-18T15:10:05.843Z · score: 4 (3 votes) · LW · GW

Any resources you'd recommend that describe the constructive style further? I've read Drawing on the Right Side of the Brain and so would be curious to read about this other approach more.

Comment by an1lam on Chris Olah’s views on AGI safety · 2019-11-16T19:20:10.510Z · score: 7 (2 votes) · LW · GW

I'd be curious whether Chris and the other members of the Clarity team have thought about (in their most optimistic vision) how "microscope AI" could enable other sciences -- biology, neuroscience, physics, etc. -- and if so whether they'd be willing to share. AIUI, the lack of interpretability of neural networks is much more of an immediate issue for trying to use deep learning in these areas because just being able to predict tends to be less useful.

Comment by an1lam on Robin Hanson on the futurist focus on AI · 2019-11-14T16:13:53.839Z · score: 1 (1 votes) · LW · GW

Mostly unrelated to your point about AI, your comments about the 100,000 fans having the potential to cause harm rang true to me.

Are there other areas in which you think the many non-expert fans problem is especially bad (as opposed to computer security, which you view as healthy in this respect)?

Then the experts can be reasonable and people can say, “Okay,” and take their word seriously, although they might not feel too much pressure to listen and do anything. If you can say that about computer security today, for example, the public doesn’t scream a bunch about computer security.


Would you consider progress on image recognition and machine translation as outside view evidence for lumpiness? Accuracies on ImageNet, an image classification benchmark, dropped by >10% over a 4-year period (graph below) mostly due to the successful scaling up of a type of neural network.

This also seems relevant to your point about AI researchers who have been in the field for a long time being more skeptical. My understanding is that most AI researchers would not have predicted such rapid progress on this benchmark before it happened.

That said, I can see how you still might argue this is an example of over-emphasizing a simple form of perception, which in reality is much more complicated and involves a bunch of different interlocking pieces.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-11-13T23:52:49.041Z · score: 1 (1 votes) · LW · GW

In an interesting turn of events, John Carmack announced today that he'll be pivoting to work on AGI.

Comment by an1lam on Open & Welcome Thread - November 2019 · 2019-11-08T01:05:57.689Z · score: 8 (5 votes) · LW · GW

First of all, I apologize, I think my comment was too snarky and took a tone of "this is so surprising" that I regret on reflection.

Under what circumstances do you feel introducing new policy ideas with the word "maybe this could be a good idea" is acceptable? To be clear, I think introducing the idea is totally fine, I just have a decently strong prior against widespread bans on "businesses that do this".

I don't expect anyone important to be reading this thread, certainly not important policymakers. Even if they were, I think it was pretty clear I was spitballing.

Agreed, I was not worried about this.

Let's not fall prey to the halo effect. Eliezer also wrote a long post about the necessity of back-and-forth debate, and he's using a platform which is uniquely bad at this. At some point, one starts to wonder whether Eliezer is a mortal human being who suffers from akrasia and biases just like the rest of us.

Fair enough. I agree the Eliezer point isn't strong evidence.

I didn't make much of an effort to assemble arguments that Twitter is bad. But I think there are good arguments out there. How do you feel about the nuclear diplomacy that's happened on Twitter?

I don't have time to respond at length to this part at the moment (I wanted to reply quickly to apologize mostly) but I agree it's the most useful question to discuss and will try to respond more later. To summarize, I acknowledge it's possible that Twitter is bad for the collective but think people may overestimate the bad parts by focusing on how politicians / people fighting about politics use Twitter (which does seem bad) and that even if it is "bad", it's not clear that banning short response websites would lead to a better long-term outcome. For example, maybe people would just start fighting with pictures on Instagram. I don't think this specific outcome is likely but think it's in a class of outcomes that would result from banning that seems decently likely.

Comment by an1lam on Open & Welcome Thread - November 2019 · 2019-11-07T18:46:43.862Z · score: 8 (4 votes) · LW · GW

Woah, this seems like a big jump to a form of technocracy / paternalism that I would think would typically require more justification than spending a short amount of time brainstorming in a comment thread why the thing millions of people use daily is actually bad.

Like, banning sites from offering free services if a character limit is involved because high status members of communities you like enjoy such sites seems like bad policy and also a weird way to try and convince these people to write on forums you prefer. Now, one counterargument would be "coordination problems" mean those writers would prefer to write somewhere else. But presumably if anyone's aware of "inadequate equilibria" like this and able to avoid it it would be Eliezer.

Re-reading my comment, I realize it may come off as snarky but I'm not really sure how better to convey my surprise that this would be the first idea that comes to mind.

ETA: it's not clear to me that Twitter is so toxic, especially for people in the broad bubble that encompasses LW / EA / tech / etc. I agree it's not the best possible version of a platform by any means, but to say it's obviously net negative seems like a stretch without further evidence.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-11-05T14:00:23.575Z · score: 12 (3 votes) · LW · GW

Interesting Bill Thurston quote, sadly from his obituary:

I’ve always taken a “lazy” attitude toward calculations. I’ve often ended up spending an inordinate amount of time trying to figure out an easy way to see something, preferably to see it in my head without having to write down a long chain of reasoning. I became convinced early on that it can make a huge difference to find ways to take a step-by-step proof or description and find a way to parallelize it, to see it all together all at once—but it often takes a lot of struggle to be able to do that. I think it’s much more common for people to approach long case-by-case and step-by-step proofs and computations as tedious but necessary work, rather than something to figure out a way to avoid. By now, I’ve found lots of “big picture” ways to look at the things I understand, so it’s not as hard. To prevent mis-interpretation, I think people often look at quotes like this (I've seen similar ones about Feynman) and think "ah yes, see anyone can do it". But IME the thing he's describing is much harder to achieve than the "case-by-case"/"step-by-step" stuff.

Comment by an1lam on Matthew Barnett's Shortform · 2019-11-05T03:57:32.465Z · score: 2 (2 votes) · LW · GW

Weirdly enough, I was doing something today that made me think about this comment. The thought I had is that you caught onto something good here which is separate from the pressure aspect. There seems to be a benefit to trying to separate different aspects of a task more than may feel natural. To use the final exam example, as someone mentioned before, part of the reason final exams feel productive is because you were forced to do so much prep beforehand to ensure you'd be able to finish the exam in a fixed amount of time.

Similarly, I've seen benefit when I (haphazardly since I only realized this recently) clearly segment different aspects of an activity and apply artificial constraints to ensure that they remain separate. To use your VAE blog post example, this would be like saying, "I'm only going to use a single page of notes to write the blog post" to force yourself to ensure you understand everything before trying to write.

YMMV warning: I'm especially bad about trying to produce outputs before fully understanding and therefore may get more bandwidth out of this than others.

Comment by an1lam on Book Review: Design Principles of Biological Circuits · 2019-11-05T02:15:19.805Z · score: 3 (3 votes) · LW · GW

No problem, also, in case it wasn't obvious based on the fact that I commented basically right after you posted this, I liked the post a lot and now plan to read this book ASAP.

Comment by an1lam on Book Review: Design Principles of Biological Circuits · 2019-11-04T23:17:58.749Z · score: 8 (6 votes) · LW · GW

Note: the talk you mentioned was by Drew Endy and the electrical engineering colleague was the one and only Gerald Sussman of SICP fame (source: I don't remember but I'm very confident about being right).

Comment by an1lam on Epistemic Spot Check: The Role of Deliberate Practice in the Acquisition of Expert Performance · 2019-11-03T13:26:59.091Z · score: 2 (2 votes) · LW · GW

In this interview, Don Knuth gives the strong impression that he works more than 4 hours a day not necessarily doing deliberate practice but definitely hard cognitive work (writing a book that most people consider quite challenging to read). That said, Knuth is kind of a monster in general in terms of combining really high technical ability and really high conscientiousness, so it wouldn't surprise me if he's similar to the other outliers you mentioned and is not representative.

A few examples to back up my conscientiousness point:

  • The following story about doing all the problems in the textbook (from that interview):

But Thomas’s Calculus would have the text, then would have problems, and our teacher would assign, say, the even numbered problems, or something like that. I would also do the odd numbered problems. In the back of Thomas’s book he had supplementary problems, the teacher didn’t assign the supplementary problems; I worked the supplementary problems. I was, you know, I was scared I wouldn’t learn calculus, so I worked hard on it, and it turned out that of course it took me longer to solve all these problems than the kids who were only working on what was assigned, at first.

  • Writing an entire compiler on his own over a summer.
  • Finishing his PhD in three years while also consulting with Burroughs.
Comment by an1lam on The Technique Taboo · 2019-10-30T22:18:53.447Z · score: 8 (4 votes) · LW · GW

I agree with this criticism and have made it myself before. However, I do think there's something to it when it comes to especially slow typing (arbitrarily, <= 35 WPM). At least for me, sometimes I need to "clear out my mental buffer" by actually writing out the code I've thought about before I can think about the next thing. When I'm in this position, being able to type relatively quickly seems to helps me stay in the flow and get to the next thought faster.

Also, frankly, sometimes you just need to churn out 10s of lines of code that are somewhat mindless (yes, DRY, but I still think some situations require this) and while doing them fast doesn't save you that much time overall, it certainly can help with maintaining sanity.

Related to that, I think the big problem with hunt and peck typing is that it isn't just slower, it also takes your attention off the flow of the code by forcing you to focus on the characters you're typing.

ETA: all that said, I definitely agree with Said that it's not necessary to learn typing in a formal setting and definitely would not encourage colleges to teach it. I actually did have a computer class in elementary school that taught touch typing but still got much faster mostly by using AIM in middle school.

Comment by an1lam on Turning air into bread · 2019-10-30T14:32:36.271Z · score: 1 (1 votes) · LW · GW

Thanks for sharing your perspective!

Comment by an1lam on Turning air into bread · 2019-10-29T22:31:58.617Z · score: 11 (3 votes) · LW · GW

As crazy as the prophet Malthus sounded, some people continuously try to heed his warning, thankfully we're learning how to coordinate our population growth to support a good life within the limited carrying capacity of our natural resources better and better over time. Time and time again, a wizard makes a new gizmo and we all get away with it.

The modified version of your first paragraph from above feels as or more accurate to me. I'd be curious to hear why you think it won't always be like this (besides X-risk, which I totally understand would lead to a "not always like this" situation).

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-10-29T13:58:44.367Z · score: 6 (3 votes) · LW · GW

Blockchain idea inspired by 80,000 Hours's interview of Vitalik Buterin: a lot of podcasts either have terrible transcriptions or presumably pay a service to transcribe their sessions. However, even these services make minor typos such as "ASX" instead of "ASICs" in the linked interview.

Now, most people who read these transcripts presumably notice at least a subset of these typos but don't want to go through the effort of emailing podcasters to tell them about it. On the flip side, there's no good way for hosts to scalably audit flagged typos to see if they're actually typos. What we really want is a mostly automated mechanism to aggregate flagged typos and accept fixes which multiple people agree upon that only pays people (micro amounts) for correctly identifying typos.

This mechanism seems like something that could live on a blockchain in some sort of smart contract. Obviously, like almost every blockchain application, you *could* do it without a blockchain, but using blockchain makes it easy to audit and distribute the micro-payments rather than having to program the voting scheme from scratch on top of a centralized database.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-10-24T19:57:41.166Z · score: 3 (2 votes) · LW · GW

If algebra's a deal with the devil where you get the right answer but don't know why, then geometric intuition's a deal with the devil where you always get an answer but don't know whether it's right.

Comment by an1lam on Raemon's Scratchpad · 2019-10-24T19:56:16.073Z · score: 1 (1 votes) · LW · GW

Uh, what if you forget to do your habit troubleshooting habit and then you have to troubleshoot why you forgot it? And then you forget it twice and you have to troubleshoot why you forgot to troubleshoot forgetting to troubleshoot!

(I'm joking about all this in case it's not obvious.)

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-10-24T19:35:35.608Z · score: 3 (3 votes) · LW · GW

Someone should write the equivalent of TAOCP for machine learning.

(Ok, maybe not literally the equivalent. I mean Knuth is... Knuth. So it doesn't seem realistic to expect someone to do something as impressive as TAOCP. And yes, this is authority worship and I don't care. He's Knuth goddamn it.)

Specifically, a book where the theory/math's rigorous but the algorithms are described in their efficient forms. I haven't found this in the few ML books I've read parts of (Bishop's Pattern Recognition and Machine Learning, MacKay's Information Theory, and Tibrishani et Al's Elements of Statistical Learning), so if it's already out there, let me know.

Note that I don't mean that whoever does this should do the whole MMIX thing and write their own language and VM.

Comment by an1lam on Raemon's Scratchpad · 2019-10-24T19:34:35.205Z · score: 3 (2 votes) · LW · GW

Just don't get trapped in infinite recursion and end up overloading your habit stack frame!

Comment by an1lam on bgaesop's Shortform · 2019-10-24T19:34:02.056Z · score: 12 (6 votes) · LW · GW

As one data point, I'm a silent (until now) skeptic in the sense that I got really into meditation in college (mostly of the The Mind Illuminated variety), and felt like I was benefiting but not significantly more than I did from other more mundane activities like exercise, eating well, etc. Given the required time investment of ~1 hour a day to make progress, I just decided to stop at some point.

I don't talk about it much because I know the response will be that I never got to the point of "enlightenment" or needed a teacher (both possibilities that I acknowledge), but I figured given your post, I'd leave a short reply mentioning my not necessarily generalizable experience.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-10-24T00:05:47.284Z · score: 1 (1 votes) · LW · GW

My takeaway from the article was that, to your point, their brains weren't using more energy. Rather, the best hypothesis was just that their adrenal hormones remained elevated for many hours of the day, leading to higher metabolism during that period. Running an hour a day is definitely not enough to burn 6000 calories for the record (a marathon burns around 3500).

Maybe I wasn't clear, but that's what I meant by the following.

The ar­ti­cle’s claiming that chess play­ers burn more en­ergy purely from the side effects of stress, not be­cause their brains are do­ing more work. So why am I re­vis­it­ing this ques­tion?

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-10-23T14:11:55.288Z · score: 3 (3 votes) · LW · GW
I feel like the center often shifts as I learn more about a topic (because I develop new interests within it). The questions I ask myself are more like "How embarrassed would I be if someone asked me this and I didn't know the answer?" and "How much does knowing this help me learn more about the topic or related topics?" (These aren't ideal phrasings of the questions my gut is asking.)

Those seem like good questions to ask as well. In particular, the second one is something I ask myself although, similar to you, in my gut more than verbally. I also deal with the "center shifting" by revising cards aggressively if they no longer match my understanding. I even revise simple phrasing differences when I notice them. That is, if I repeatedly phrase the answer to a card one way in my head and have it phrased differently on the actual card, I'll change the card.

In my experience, I often still forget things I've entered into Anki either because the card was poorly made or because I didn't add enough "surrounding cards" to cement the knowledge. So I've shifted away from this to thinking something more like "at least Anki will make it very obvious if I didn't internalize something well, and will give me an opportunity in the future to come back to this topic to understand it better instead of just having it fade without detection".

I think both this and the original motivational factor I described apply for me.

I'm confused about what you mean by this. (One guess I have is big-O notation, but big-O notation is not sensitive to constants, so I'm not sure what the 5 is doing, and big-O notation is also about asymptotic behavior of a function and I'm not sure what input you're considering.)

You're right. Sorry about that... I just heinously abuse big-O notation and sometimes forget to not do it when talking with others/writing. Edited the original post to be clearer ("on the order of 10").

I think there are few well-researched and comprehensive blog posts, but I've found that there is a lot of additional wisdom the spaced repetition community has accumulated, which is mostly written down in random Reddit comments and smaller blog posts. I feel like I've benefited somewhat from reading this wisdom (but have benefited more from just trying a bunch of things myself).

Interesting, I've perused the Anki sub-reddit a fair amount, but haven't found many posts that do what I'm looking for, which is both give good guidelines and back them up with specific examples. This is probably the closest thing I've read to what I'm looking for, but even this post mostly focuses on high level recommendations and doesn't talk about the nitty-gritty such as different types of cards for different types of skills. If you've saved some of your favorite links, please share!

I agree that trying stuff myself has worked better than reading.

For myself, I've considered writing up what I've learned about using Anki, but it hasn't been a priority because (1) other topics seem more important to work on and write about; (2) most newcomers cannot distinguish been good and bad advice, so I anticipate having low impact by writing about Anki; (3) I've only been experimenting informally and personally, and it's difficult to tell how well my lessons generalize to others.

Regarding other topics being more important, I admit I mostly wrote up the above because I couldn't stop thinking about it rather than based on some sort of principled evaluation of how important it would be. That said, I personally would get a lot of value out of having more people write up detailed case reports of how they've been using Anki and what does/doesn't work well for them that give lots of examples. I think you're right that this won't necessarily be helpful for newcomers, but I do think it will be helpful for people trying to refine their practice over long periods of time. Given that most advice is targeted at newcomers, while the overall impact may be lower, I'd argue "advice for experts" is more neglected and more impactful on the margin.

Regarding takeaways not generalizing, this is why I think giving lots of concrete examples is good because it basically makes your claims reproducible. That is, someone can go out and try what you described fairly easily and see if it works for them.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-10-23T14:00:34.746Z · score: 3 (2 votes) · LW · GW

You're right on both accounts. Maybe I should've discussed this in my original post... At least for me, Anki serves different purposes at different stages of learning.

Key definitions tend to be useful in the early stages, especially if I'm learning something on and off, as a way to prevent myself from having to constantly refer back and make it easier to think about what they actually mean when I'm away from the source. E.g., I've been exploring alternate interpretations of d-separation in my head during my commute and it helps that I remember the precise conditions in addition to having a visual picture.

Once I've mastered something, I agree that the "concepts and competencies" ("mental moves" is my preferred term) become more important to retain. E.g., I remember the spectral theorem but wish I remembered the sketch of what it looks like to develop the spectral theorem from scratch. Unfortunately, I'm less clear/experienced on using Anki to do this effectively. I think Michael Nielsen's blog post on seeing through a piece of mathematics is a good first step. Deeply internalizing core proofs from an area presumably should help for retaining the core mental moves involved in being effective in that area. But, this is quite time intensive and also prioritizes breadth over depth.

I actually did mention two things that I think may help with retaining concepts and competencies - Anki-izing the same concepts in different ways (often visually) and Anki-izing examples of concepts. I haven't experienced this yet, but I'm hopeful that remembering alternative visual versions of definitions, analogies to them, and examples of them may help with the types of problems where you can see the solution at a glance if you have the right mental model (more common in some areas than others). For example, I remember feeling (usually after agonizing over a problem for a while) like Linear Algebra Done Right had a lot of exercises where the right geometric intuition or representative example would allow you to see the solution relatively quickly and then just have to convert it to words.

Another idea for how to Anki-ize concepts and competencies better that I haven't tried (yet) but will share anyway is succinctly capturing strategies pop up again and again in similar forms. To use another Linear Algebra Done Right example, there are a lot of exercises with solutions of the form "construct some arbitrary linear map that does what we want" and show it... does what we want. I remember this technique but worry that my pattern matching machinery for the types of problems to which it tends to apply has decayed. On the other hand, if I had an Anki card that just listed short descriptions of a few exercises and asked me which technique was core to their solutions, maybe I'd retain that competency better.

Comment by an1lam on NaiveTortoise's Short Form Feed · 2019-10-23T01:23:24.083Z · score: 19 (7 votes) · LW · GW

Anki's Not About Looking Stuff Up

Attention conservation notice: if you've read Michael Nielsen's stuff about Anki, this probably won't be new for you. Also, this is all very personal and YMMV.

In a number of discussions of Anki here and elsewhere, I've seen Anki's value measured in terms of time saved by not having to look stuff up. For example, Gwern's spaced repetition post includes a calculation of when it's worth it to Anki-ize threshold, although I would be surprised if Gwern hasn't already thought about the claim going to make.

While I occasionally use Anki to remember things that I would otherwise have to Google, e.g. statistics, I almost never Anki-ize things so that I can avoid Googling them in the future. And I don't think in terms of time saved when deciding what to Anki-ize.

Instead, (as Michael Nielsen discusses in his posts) I almost always Anki-ize with the goal of building a connected graph of knowledge atoms about an area in which I'm interested. As a result, I tend to evaluate what to Anki-ize based on two criteria:

  1. Will this help me think about this domain without paper or a computer better?
  2. In the Platonic graph of this domain's knowledge ontology, how central is this node? (Pedantic note: it's easier to visualize distance to the root of the tree, but this requires removing cycles from the graph.)

To make this more concrete, let's look at an example of a topic I've been Anki-izing recently, causal inference. I just started Anki-izing this topic a week ago, so it'll be easier for me to avoid idealizing the process. Looking at my cards so far, I have questions about and definitions of things like "d-separation", "sufficient/admissible sets", and "backdoor paths". Notably, for each of these, I don't just have a cloze card to recall the definition, I also have image cards that quiz me on examples and conceptual questions that clarify things I found confusing upon first encountering these concepts. I've found that making these cards has the effect of both forcing me to ensure I understand concepts (because writing cards requires breaking them down) and makes it easier to bootstrap my understanding over the course of multiple days. Furthermore, knowing that I'll remember at least the stuff I've Anki-ized has a surprisingly strong motivational impact on me on a gut level.

All that said, I suspect there are some people for whom Anki-izing wouldn't be helpful.

The first is people who have the time and a career in which they focus on a narrow enough set of topics such that they repeatedly see the same concepts and rarely go for long periods without revisiting them. I've experienced this myself for Python - I learned it well before starting to use Anki and used it every day for many years. So even if I forget some stuff, it's very easy for me to use the language fluently after time away from it.

The second is, for lack of a better term, actual geniuses. Like, if you're John Von Neumann and you legitimately have an approximation of a photographic memory (I'm really skeptical that he actually had an eidetic memory but regardless...) and can understand any concept incredibly quickly, you probably don't need Anki. Also, if you're the second coming if John Von Neumann and you're reading this, cool!

To give another example, Terry Tao is a genius who also has spent his entire life doing math. Probably doesn't need Anki (or advice from me in general in case it wasn't obvious).

Finally, I do think how to use Anki well is an under-explored topic given that there's on the order of 10 actual blog posts about it. Given this, I'm still figuring things out myself, in particular around how to Anki-ize stuff that's more procedural, e.g. "when you see a problem like this, consider these three strategies" or something. If you're also experimenting with Anki, I'd love to hear from you!

Comment by an1lam on Raemon's Scratchpad · 2019-10-20T20:25:38.205Z · score: 2 (2 votes) · LW · GW

In my experience, trade can work well here. That is, you care more about cleanliness than your roommate, but they either care abstractly about your happiness or care about some concrete other thing you care about less, e.g. temperature of the apartment. So, you can propose a trade where they agree to be cleaner than they would be otherwise in exchange for you either being happier or doing something else that they care about.

Semi-serious connection to AI: It's kind of like merging your utility functions but it's only temporary.

Comment by an1lam on Billion-scale semi-supervised learning for state-of-the-art image and video classification · 2019-10-19T20:10:36.540Z · score: 1 (1 votes) · LW · GW

Interesting, I somehow hadn't seen this. Thanks! (Editing to reflect this as well.)

I'm curious - even though this isn't new, do you agree with my vague claim that the fact that this and the paper you linked work pertains to the feasibility of amplification-style strategies?

Comment by an1lam on Declarative Mathematics · 2019-10-19T16:16:28.342Z · score: 3 (2 votes) · LW · GW

To your request for examples, my impression is that Black Box Variational Inference is slowly but surely becoming the declarative replacement to MCMC for a lot of generative modeling stuff.

Comment by an1lam on What's your big idea? · 2019-10-19T15:48:14.990Z · score: 1 (1 votes) · LW · GW

Have you read any of Cosma Shalizi's stuff on computational mechanics? Seems very related to your interests.