NaiveTortoise's Short Form Feed

post by NaiveTortoise (An1lam) · 2018-08-11T18:33:15.983Z · score: 14 (3 votes) · LW · GW · 53 comments

In light of reading Hazard's Shortform Feed [LW(p) · GW(p)] -- which I really enjoy -- based on Raemon's Shortform feed, I'm making my own. There be thoughts here. Hopefully, this will also get me posting more.

53 comments

Comments sorted by top scores.

comment by NaiveTortoise (An1lam) · 2019-09-02T17:00:53.243Z · score: 20 (10 votes) · LW(p) · GW(p)

Cruxes I Have With Many LW Readers

There's a crux I seem to have with a lot of LWers that I've struggled to put my finger on for a long time but I think reduces to some combination of:

  • faith in elegance vs. expectation of messiness;
  • preference for axioms vs. examples;
  • identification as primarily a scientist/truth-seeker vs. as an engineer/builder.

I tend to be more inclined towards the latter in each case, whereas I think a lot of LWers are inclined towards the former, with the potential exception of the author of realism about rationality [LW · GW], who seems to have opinions that overlap with many of my own. While I still feel uncomfortable with the above binaries, I've now gathered enough examples to at least list them as evidence for what I'm talking about.

Example 1: Linear Algebra Textbooks

A few [LW · GW] LWers [LW · GW] have positively reviewed Linear Algebra Done Right (LADR), in particular complimenting it for revealing the inner workings of Linear Algebra. I too recently read most of this book and did a lot of the exercises. And... I liked it but seemingly less than the other reviewers. In particular, I enjoyed getting a lot of practice reading definition-theorem-proof style math and doing lots of proofs myself, but found myself wishing for more examples and discussion of how to compute things like eigenvalues in practice. While I know that's not what the book's about, the difference I'm pointing to is more that I found the omission of these things bothersome, whereas I suspect the other reviewers were happy with the focus on constructing the different objects mathematically (I'm also obviously making some assumptions here).

On the other hand, I've recently been reading sections of Shilov's Linear Algebra, which is more concrete but does more ugly stuff like present the determinant very early on, and I feel like I'm learning better from it.

I think one contributing factor towards this preference difference is that I tend to be more OK with unmotivated messiness if the messy thing is clearly useful for something but less OK slogging through a bunch of elegant but not-clear-what-it's-used-for build up. Another way to put this would be that I tend to like to get top-down view of a subject and then go depth-first afterwards, whereas others seem happy to learn bottom-up. I used to think this was because of my experience with programming where algorithms are pretty much always presented in
terms of their purpose and tend to be become messier as they get optimized for performance. I still like knowing the motivation for things, but I also accept that stuff that works for real applications often has a bunch of messiness. On the other hand, a lot of LWers are also programmers who are only now going deep on math and they seem to still be happy with the axiomatic math way of doing things. So having a programming background doesn't seem to correlate with my preferences that strongly...

What would be great would be if someone would chime in providing better hypotheses/explanations than the one I've given.

Example 2: Scientists vs. Engineers as Role Models

Much of early LW content, the Sequences in particular, used scientists like Einstein and Feynman as role models in discussions (and also targets of criticism in fairness). While I love Feynman and Einstein too, I tend to also revere builders/engineers, such as John Carmack, Jeff Dean, and Konrad Zuse, but these types of people don't seem to get nearly as much praise on LW.

One explanation for this is that great but not necessarily thoughtful engineers can drive X-risk through their work. For example, here's [LW(p) · GW(p)] a discussion where a few folks argue that AGI requires insight more than programming ability and explicitly mention needing Judea Pearl more than John Carmack. While this is a fair argument, I'm skeptical that it's the true rejection. Security mindset seems to be as common among engineers as it is among scientists given that most of the folks who participate in things like DefCon and work in computer security tend to be hardcore engineer types like Trammell Hudson. (In his original essay, Eliezer cites Bruce Schneier, definitely an engineer, as someone he trusts to have security mindset.)

Another potential explanation for this is that LW readers tend to like doing and learning about science (pure math included) more than doing engineering. It's plausible that people who were attracted to early LW/OB content and were compelled by arguments for X-risk tend to also prefer science to engineering.

Conclusion

Unfortunately, I don't have some sort of nice insight to conclude this with. I don't think the differences between my and other LWers preferences are bad so much as an implicit thing that doesn't get discussed.

I am curious whether my dichotomies seem reasonably accurate to anyone reading this? And if so, do my hypotheses for them seem reasonable?

comment by mr-hire · 2019-09-02T18:08:20.110Z · score: 12 (4 votes) · LW(p) · GW(p)

I have similar differences with many people on LW and agree there is something of an unacknowledged aesthetic here.

comment by jimrandomh · 2019-09-10T01:05:42.962Z · score: 4 (3 votes) · LW(p) · GW(p)

I think the engineer mindset is more strongly represented here than you think, but that the nature of nonspecialist online discussion warps things away from the engineer mindset and towards the scientist mindset. Both types of people are present, but the engineer-mindset people tend not to put that part of themselves forward here.

The problem with getting down into the details is that there are many areas with messy details to get into, and it's hard to appreciate the messy details of an area you haven't spent enough time in. So deep dives in narrow topics wind up looking more like engineer-mindset, while shallow passes over wide areas wind up looking more like scientist-mindset. LessWrong posts can't assume much background, which limits their depth.

I would be happy to see more deep-dives; a lightly edited transcript of John Carmack wouldn't be a prototypical LessWrong post, but it would be a good one. But such posts are necessarily going to exclude a lot of readers, and LessWrong isn't necessarily going to be competitive with posting in more topic-specialized places.

comment by NaiveTortoise (An1lam) · 2019-09-10T02:33:40.011Z · score: 3 (2 votes) · LW(p) · GW(p)

These are all good points.

After I saw that Benito did a transcript post, I considered doing one for one of Carmack's talks or a recent interview of Yann LeCunn I found pretty interesting (based on the talks of his I've listened to, LeCunn has a pretty engineering-y mindset even though he's nominally a scientist). Not going to happen immediately though since it requires a pretty big time investment.

Alternatively, maybe I'll review Masters of Doom, which is where I learned most of what I know about Carmack.

comment by Pattern · 2019-09-03T00:12:13.320Z · score: 3 (2 votes) · LW(p) · GW(p)
What would be great would be if someone would chime in providing better hypotheses/explanations than the one I've given.

As the dichotomy isn't jumping out at me, I guess I should read both of those books* sometime and see which I like more.

*Linear Algebra Done Right (LADR)

Shilov's Linear Algebra

comment by Ruby · 2019-09-02T17:29:22.642Z · score: 3 (6 votes) · LW(p) · GW(p)

This is really interesting, I'm glad you wrote this up. I think there's something to it.

Some quick comments:

  • I generally expect there to exist simple underlying principles in most domains which give rise to messiness (and often the messiness seems a bit less messy once you understand them). Perceiving "messiness" does also often feel to me like lack of understanding whereas seeing the underlying unity makes me feel like I get whatever the subject matter is.
  • I think I would like it if LessWrong had more engineers/inventors as role models and that it's something of an oversight that we don't. Yet I also feel like John Carmack probably probably isn't remotely near the level of Pearl (I'm not that familiar Carmack's work): pushing forward video game development doesn't compare to neatly figuring what exactly causality itself is.
    • There might be something like all truly monumental engineering breakthroughs depended on something like a "scientific" breakthrough. Something like Faraday and Maxwell figuring out theories of electromagnetism is actually a bigger deal than Edison(/others) figuring out the lightbulb, the radio, etc. There are cases of lauded people who are a little more ambiguous on the science/engineer dichotomy. Turing? Shannon? Tesla? Shockley et al with the transistor seems kind of like an engineering breakthrough, and seems there could be love for that. I wonder if Feynman gets more recognition because as an educator we got a lot more of the philosophy underlying his work. Just rambling here.
  • A little on my background: I did an EE degree which was very practical focus. My experience is that I was taught how to do apply a lot of equations and make things in the lab, but most courses skimped on providing the real understanding that left me overall worse as an engineer. The math majors actually understood Linear Algebra, the physicists actually understood electromagnetism, and I knew enough to make some neat things in the lab and pass tests, but I was worse off for not having a deeper "theoretical" understanding. So I feel like I developed more of an identity as a engineer, but came to feel that to be a really great engineer I needed to get the core science better*.

*I have some recollection that Tesla could develop a superior AC electric system because he understood the underlying math better than Edison, but this is a vague recollection.

comment by jimrandomh · 2019-09-10T00:36:57.021Z · score: 8 (3 votes) · LW(p) · GW(p)
Yet I also feel like John Carmack probably probably isn't remotely near the level of Pearl (I'm not that familiar Carmack's work): pushing forward video game development doesn't compare to neatly figuring what exactly causality itself is.

You're looking at the wrong thing. Don't look at the topic of their work; look at their cognitive style and overall generativity. Carmack is many levels above Pearl. Just as importantly, there's enough recorded video of him speaking unscripted that it's feasible to absorb some of his style.

comment by Ruby · 2019-09-12T01:54:31.874Z · score: 2 (1 votes) · LW(p) · GW(p)
You're looking at the wrong thing. Don't look at the topic of their work; look at their cognitive style and overall generativity.

By generativity do you mean "within-domain" generativity?

Carmack is many levels above Pearl.

To unpack which "levels" I was grading on, it's something like a blend of "importance and significance of their work" / "difficulty of the problems they were solving", admittedly that's still pretty vague. On those dimensions, it seems entirely fair to compare across topics and assert that Pearl was solving more significant and more difficult problem(s) than Carmack. And for that "style" isn't especially relevant. (This can also be true even if Carmack solved many more problems.)

But I'm curious about your angle - when you say that Carmack is many levels above Pearl, which specific dimensions is that on (generativity and style?) and do you have any examples/links for those?

comment by jimrandomh · 2019-09-12T02:01:17.861Z · score: 2 (1 votes) · LW(p) · GW(p)
By generativity do you mean "within-domain" generativity?

Not exactly, because Carmack has worked in more than one domain (albeit not as successfully; Armadillo Aerospace never made orbit.)

On those dimensions, it seems entirely fair to compare across topics and assert that Pearl was solving more significant and more difficult problem(s) than Carmack

Agree on significance, disagree on difficulty.

comment by NaiveTortoise (An1lam) · 2019-11-13T23:52:49.041Z · score: 1 (1 votes) · LW(p) · GW(p)

In an interesting turn of events, John Carmack announced today that he'll be pivoting to work on AGI.

comment by mr-hire · 2019-09-06T19:33:57.290Z · score: 5 (3 votes) · LW(p) · GW(p)
There might be something like all truly monumental engineering breakthroughs depended on something like a "scientific" breakthrough. Something like Faraday and Maxwell figuring out theories of electromagnetism is actually a bigger deal than Edison(/others) figuring out the lightbulb, the radio, etc. There are cases of lauded people who are a little more ambiguous on the science/engineer dichotomy. Turing? Shannon? Tesla? Shockley et al with the transistor seems kind of like an engineering breakthrough, and seems there could be love for that. I wonder if Feynman gets more recognition because as an educator we got a lot more of the philosophy underlying his work. Just rambling here.

TRIZ is an engineering discipline that has something called the five levels of innovation, which talks about this:

1. You solve a problem by using a common solution in your own speciality.

2. You solve a problem using a common solution i your own industry.

3. You solve a problem using a common solution found in other industries.

4. You solve a problem using a solution built on first principles (e.g. little known scientific principles.)

5. You solve a problem by discovering a new principle/scientific rule.

comment by Ruby · 2019-09-12T01:46:40.283Z · score: 2 (1 votes) · LW(p) · GW(p)

Seems you're referring to this https://en.wikipedia.org/wiki/TRIZ?

comment by mr-hire · 2019-09-12T01:58:44.719Z · score: 2 (1 votes) · LW(p) · GW(p)

Yes.

comment by NaiveTortoise (An1lam) · 2019-09-02T18:34:04.009Z · score: 2 (2 votes) · LW(p) · GW(p)

Thanks for your reply! I agree with a lot of what you said.

First off, thanks for bringing up the point about underlying principles. I agree that there are often underlying principles in many domains and that I also really like to find unity in seeming messiness. I used to be of the more extreme view that principles were in some sense more important than the details, but I've become more skeptical over time for two reasons.

  1. From a pedagogy perspective, I've personally never had much luck learning principles without having a strong base of practice & knowledge. That said, when I have that base, learning principles helps me improve further and is satisfying.
  2. I've realized over time how much of action (where action can include thinking) is based upon a set of non-verbal strategies that one learns through practice and experimentation even in seemingly theoretical domains. These strategies seem to be the secret sauce that allow one to act fluently but seem meaningfully different from the types of principles people often discuss.

Another way to phrase my argument is that principles are important but very hard to transfer between minds. It's possible you agree and I'm just belaboring the point but I wanted to make it explicit.

One concrete example of the distinction I'm drawing is something called the "What Are Monads Fallacy" in the Haskell community where people try to explain monads by conveying their understanding of what mondas really are even though they learned about monads by just using them a bunch which lead to them later developing a higher level understanding of them. This reflects a more general problem where experts often struggle to teach to novices because they don't realize that their broad understanding is actually founded upon lower level understanding of a lot of details.

I think I would like it if LessWrong had more engineers/inventors as role models and that it's something of an oversight that we don't. Yet I also feel like John Carmack probably probably isn't remotely near the level of Pearl (I'm not that familiar Carmack's work): pushing forward video game development doesn't compare to neatly figuring what exactly causality itself is.

I tentatively agree, but it's pretty hard to draw comparisons. From an insight perspective, I agree that Pearl's work on Bayes Nets and Causality were probably more profound that anything Carmack came up with. From an economic perspective though, Carmack had a massive, albeit indirect, impact on the trajectory of the computing world. By coming up with new algorithms and techniques for 3D game rendering at a time when people had basically no idea how to render 3D games in realtime, Carmack drove the gaming industry forward, which certainly contributed to development of better GPUs and processors as well. Carmack was also the person at Id who insisted on making their games moddable and releasing their game engines, which eventually lead to the development of games like Half-Life.

That said, a better point of comparison to Pearl is probably Jeff Dean, who, in close collaboration with Sanjay Ghemawat, first wrote much of Google's search stack from scratch after it starting failing to scale and then subsequently invented BigTable, MapReduce, Spanner, and Tensorflow!

There might be something like all truly monumental engineering breakthroughs depended on something like a "scientific" breakthrough. Something like Faraday and Maxwell figuring out theories of electromagnetism is actually a bigger deal than Edison(/others) figuring out the lightbulb, the radio, etc. There are cases of lauded people who are a little more ambiguous on the science/engineer dichotomy. Turing? Shannon? Tesla?

Agree that science tends to be upstream of later technology developments, but I would emphasize that there are probably cases where without great engineers, the actual applications never get built. For example, there was a large gap between us understanding genes fairly well and being able to sequence and, more recently, synthesize them.

Shockley et al with the transistor seems kind of like an engineering breakthrough, and seems there could be love for that.

I agree with this and would add Lynn Conway, who invented VLSI, one of the key enablers of the modern processor industry and Moore's Law.

A little on my background: I did an EE degree which was very practical focus. My experience is that I was taught how to do apply a lot of ehttps://www.lesswrong.com/shortformquations and make things in the lab, but most courses skimped on providing the real understanding that left me overall worse as an engineer.

To be clear, I shared this frustration with the engineering curriculum. I started as a Computer Engineering major and switched to CS because I felt like engineering was just a bag of unmotivated tricks whereas in CS you could understand why things were the way they were. However, part of the reason I liked CS's theory was because it was presented in the context of understanding algorithms.

As a final point, I don't think I did a good job of my original post of emphasizing that I'm pro-understanding and pro-theory! I mostly endorse the saying, "nothing is so practical as a good theory." My perceived disagreement is more around how much I trust/enjoy theory for its own sake vs. with an eye towards practice.

comment by Ruby · 2019-09-12T02:23:23.216Z · score: 2 (1 votes) · LW(p) · GW(p)

Sorry for the delayed reply on this one.

I do think we agree on rather a lot here. A few thoughts:

1. Seems there are separate questions of "how you model/role-models and heroes/personal identity" and separate questions of pedagogy.

You might strongly seek unifying principles and elegant theories but believe the correct way to arrive at these and understand these is through lots of real-world messy interactions and examples. That seems pretty right to me.

2. Your examples in this comment do make me update on the importance of engineering types and engineering feats. It makes me think that indeed LessWrong too much focuses only on heroes of "understanding" when there are heroes "of making things happen" which is rather a key part of rationality too.

A guess might be that this is down-steam of what was focused on in the Sequences and the culture that set. If I'm interpreting Craft and the Community [? · GW] correctly, Eliezer never saw the Sequences as covering all of rationality or all of what was important, just his own particular sub-art that he created in the course of trying to do one particular thing.

That's my dream—that this highly specialized-seeming art of answering confused questions, may be some of what is needed, in the very beginning, to go and complete the rest.

Seemingly answering is confused questions is more science-y than engineering-y and would place focus on great scientists like Feynman. Unfortunately, the community has not yet supplemented the Sequences with the rest of the art of human rationality and so most of the LW culture is still downstream of the Sequences alone (mostly). Given that, we can expect the culture is missing major key pieces of what would be the full art, e.g. whatever skills are involved in being Jeff Dean and John Carmack.

My perceived disagreement is more around how much I trust/enjoy theory for its own sake vs. with an eye towards practice.

About that you might be correct. Personally, I do think I enjoy theory even without application. I'm not sure if my mind secretly thinks all topics will find their application, but having applications (beyond what is needed to understand) doesn't feel key to my interest, so something.

comment by NaiveTortoise (An1lam) · 2019-09-12T19:02:29.950Z · score: 9 (5 votes) · LW(p) · GW(p)

At this point, I basically agree that we agree and that the most useful follow up action is for someone (read: me) to actually be the change they want to see and write some (object-level), and ideally good, content from a more engineering-y bent.

As I mentioned in my reply to jimrandomh, a book review seems like a good place for me to start.

comment by Ruby · 2019-09-12T22:24:37.982Z · score: 2 (1 votes) · LW(p) · GW(p)

Cool. Looking forward to it!

comment by NaiveTortoise (An1lam) · 2019-10-23T01:23:24.083Z · score: 19 (7 votes) · LW(p) · GW(p)

Anki's Not About Looking Stuff Up

Attention conservation notice: if you've read Michael Nielsen's stuff about Anki, this probably won't be new for you. Also, this is all very personal and YMMV.

In a number of discussions of Anki here and elsewhere, I've seen Anki's value measured in terms of time saved by not having to look stuff up. For example, Gwern's spaced repetition post includes a calculation of when it's worth it to Anki-ize threshold, although I would be surprised if Gwern hasn't already thought about the claim going to make.

While I occasionally use Anki to remember things that I would otherwise have to Google, e.g. statistics, I almost never Anki-ize things so that I can avoid Googling them in the future. And I don't think in terms of time saved when deciding what to Anki-ize.

Instead, (as Michael Nielsen discusses in his posts) I almost always Anki-ize with the goal of building a connected graph of knowledge atoms about an area in which I'm interested. As a result, I tend to evaluate what to Anki-ize based on two criteria:

  1. Will this help me think about this domain without paper or a computer better?
  2. In the Platonic graph of this domain's knowledge ontology, how central is this node? (Pedantic note: it's easier to visualize distance to the root of the tree, but this requires removing cycles from the graph.)

To make this more concrete, let's look at an example of a topic I've been Anki-izing recently, causal inference. I just started Anki-izing this topic a week ago, so it'll be easier for me to avoid idealizing the process. Looking at my cards so far, I have questions about and definitions of things like "d-separation", "sufficient/admissible sets", and "backdoor paths". Notably, for each of these, I don't just have a cloze card to recall the definition, I also have image cards that quiz me on examples and conceptual questions that clarify things I found confusing upon first encountering these concepts. I've found that making these cards has the effect of both forcing me to ensure I understand concepts (because writing cards requires breaking them down) and makes it easier to bootstrap my understanding over the course of multiple days. Furthermore, knowing that I'll remember at least the stuff I've Anki-ized has a surprisingly strong motivational impact on me on a gut level.

All that said, I suspect there are some people for whom Anki-izing wouldn't be helpful.

The first is people who have the time and a career in which they focus on a narrow enough set of topics such that they repeatedly see the same concepts and rarely go for long periods without revisiting them. I've experienced this myself for Python - I learned it well before starting to use Anki and used it every day for many years. So even if I forget some stuff, it's very easy for me to use the language fluently after time away from it.

The second is, for lack of a better term, actual geniuses. Like, if you're John Von Neumann and you legitimately have an approximation of a photographic memory (I'm really skeptical that he actually had an eidetic memory but regardless...) and can understand any concept incredibly quickly, you probably don't need Anki. Also, if you're the second coming if John Von Neumann and you're reading this, cool!

To give another example, Terry Tao is a genius who also has spent his entire life doing math. Probably doesn't need Anki (or advice from me in general in case it wasn't obvious).

Finally, I do think how to use Anki well is an under-explored topic given that there's on the order of 10 actual blog posts about it. Given this, I'm still figuring things out myself, in particular around how to Anki-ize stuff that's more procedural, e.g. "when you see a problem like this, consider these three strategies" or something. If you're also experimenting with Anki, I'd love to hear from you!

comment by riceissa · 2019-10-23T04:10:21.194Z · score: 5 (2 votes) · LW(p) · GW(p)

I would be surprised if Gwern hasn’t already thought about the claim going to make

I briefly looked at gwern's public database several months ago, and got the impression that he isn't using Anki in the incremental reading/learning way that you (and Michael Nielsen) describe. Instead, he seems to just add a bunch of random facts. This isn't to say gwern hasn't thought about this, but just that if he has, he doesn't seem to be making use of this insight.

In the Platonic graph of this domain’s knowledge ontology, how central is this node?

I feel like the center often shifts as I learn more about a topic (because I develop new interests within it). The questions I ask myself are more like "How embarrassed would I be if someone asked me this and I didn't know the answer?" and "How much does knowing this help me learn more about the topic or related topics?" (These aren't ideal phrasings of the questions my gut is asking.)

knowing that I’ll remember at least the stuff I’ve Anki-ized has a surprisingly strong motivational impact on me on a gut level

In my experience, I often still forget things I've entered into Anki either because the card was poorly made or because I didn't add enough "surrounding cards" to cement the knowledge. So I've shifted away from this to thinking something more like "at least Anki will make it very obvious if I didn't internalize something well, and will give me an opportunity in the future to come back to this topic to understand it better instead of just having it fade without detection".

there’s O(5) actual blog posts about it

I'm confused about what you mean by this. (One guess I have is big-O notation, but big-O notation is not sensitive to constants, so I'm not sure what the 5 is doing, and big-O notation is also about asymptotic behavior of a function and I'm not sure what input you're considering.)

I think there are few well-researched and comprehensive blog posts, but I've found that there is a lot of additional wisdom the spaced repetition community has accumulated, which is mostly written down in random Reddit comments and smaller blog posts. I feel like I've benefited somewhat from reading this wisdom (but have benefited more from just trying a bunch of things myself). For myself, I've considered writing up what I've learned about using Anki, but it hasn't been a priority because (1) other topics seem more important to work on and write about; (2) most newcomers cannot distinguish been good and bad advice, so I anticipate having low impact by writing about Anki; (3) I've only been experimenting informally and personally, and it's difficult to tell how well my lessons generalize to others.

comment by NaiveTortoise (An1lam) · 2019-10-23T14:11:55.288Z · score: 3 (3 votes) · LW(p) · GW(p)
I feel like the center often shifts as I learn more about a topic (because I develop new interests within it). The questions I ask myself are more like "How embarrassed would I be if someone asked me this and I didn't know the answer?" and "How much does knowing this help me learn more about the topic or related topics?" (These aren't ideal phrasings of the questions my gut is asking.)

Those seem like good questions to ask as well. In particular, the second one is something I ask myself although, similar to you, in my gut more than verbally. I also deal with the "center shifting" by revising cards aggressively if they no longer match my understanding. I even revise simple phrasing differences when I notice them. That is, if I repeatedly phrase the answer to a card one way in my head and have it phrased differently on the actual card, I'll change the card.

In my experience, I often still forget things I've entered into Anki either because the card was poorly made or because I didn't add enough "surrounding cards" to cement the knowledge. So I've shifted away from this to thinking something more like "at least Anki will make it very obvious if I didn't internalize something well, and will give me an opportunity in the future to come back to this topic to understand it better instead of just having it fade without detection".

I think both this and the original motivational factor I described apply for me.

I'm confused about what you mean by this. (One guess I have is big-O notation, but big-O notation is not sensitive to constants, so I'm not sure what the 5 is doing, and big-O notation is also about asymptotic behavior of a function and I'm not sure what input you're considering.)

You're right. Sorry about that... I just heinously abuse big-O notation and sometimes forget to not do it when talking with others/writing. Edited the original post to be clearer ("on the order of 10").

I think there are few well-researched and comprehensive blog posts, but I've found that there is a lot of additional wisdom the spaced repetition community has accumulated, which is mostly written down in random Reddit comments and smaller blog posts. I feel like I've benefited somewhat from reading this wisdom (but have benefited more from just trying a bunch of things myself).

Interesting, I've perused the Anki sub-reddit a fair amount, but haven't found many posts that do what I'm looking for, which is both give good guidelines and back them up with specific examples. This is probably the closest thing I've read to what I'm looking for, but even this post mostly focuses on high level recommendations and doesn't talk about the nitty-gritty such as different types of cards for different types of skills. If you've saved some of your favorite links, please share!

I agree that trying stuff myself has worked better than reading.

For myself, I've considered writing up what I've learned about using Anki, but it hasn't been a priority because (1) other topics seem more important to work on and write about; (2) most newcomers cannot distinguish been good and bad advice, so I anticipate having low impact by writing about Anki; (3) I've only been experimenting informally and personally, and it's difficult to tell how well my lessons generalize to others.

Regarding other topics being more important, I admit I mostly wrote up the above because I couldn't stop thinking about it rather than based on some sort of principled evaluation of how important it would be. That said, I personally would get a lot of value out of having more people write up detailed case reports of how they've been using Anki and what does/doesn't work well for them that give lots of examples. I think you're right that this won't necessarily be helpful for newcomers, but I do think it will be helpful for people trying to refine their practice over long periods of time. Given that most advice is targeted at newcomers, while the overall impact may be lower, I'd argue "advice for experts" is more neglected and more impactful on the margin.

Regarding takeaways not generalizing, this is why I think giving lots of concrete examples is good because it basically makes your claims reproducible. That is, someone can go out and try what you described fairly easily and see if it works for them.

comment by riceissa · 2019-10-23T21:40:42.374Z · score: 3 (2 votes) · LW(p) · GW(p)

If you’ve saved some of your favorite links, please share!

  • I like CheCheDaWaff's comments on r/Anki; see here for a decent place to start. In particular, for proofs, I've shifted toward adding "prove this theorem" cards rather than trying to break the proof into many small pieces. (The latter adheres more to the spaced repetition philosophy, but I found it just doesn't really work.)
  • Richard Reitz has a Google doc with a bunch of stuff.
  • I like this forum comment (as a data point, and as motivation to try to avoid similar failures).
  • I like https://eshapard.github.io
  • Master How To Learn also has some insights but most posts are low-quality.

One thing I should mention is that a lot of the above links aren't written well. See this Quora answer for a view I basically agree with.

I couldn’t stop thinking about it

I agree that thinking about this is pretty addicting. :) I think this kind of motivation helps me to find and read a bunch online and to make occasional comments (such as the grandparent) and brain dumps, but I find it's not quite enough to get me to invest the time to write a comprehensive post about everything I've learned.

comment by TurnTrout · 2019-10-23T03:12:32.529Z · score: 3 (2 votes) · LW(p) · GW(p)

Although I haven't used Anki for math, it seems to me like I want to build up concepts and competencies, not remember definitions. Like, I couldn't write down the definition of absolute continuity, but if I got back in the zone and refreshed myself, I'd have all of my analysis skills intact.

I suppose definitions might be a useful scaffolding?

comment by NaiveTortoise (An1lam) · 2019-10-23T14:00:34.746Z · score: 3 (2 votes) · LW(p) · GW(p)

You're right on both accounts. Maybe I should've discussed this in my original post... At least for me, Anki serves different purposes at different stages of learning.

Key definitions tend to be useful in the early stages, especially if I'm learning something on and off, as a way to prevent myself from having to constantly refer back and make it easier to think about what they actually mean when I'm away from the source. E.g., I've been exploring alternate interpretations of d-separation in my head during my commute and it helps that I remember the precise conditions in addition to having a visual picture.

Once I've mastered something, I agree that the "concepts and competencies" ("mental moves" is my preferred term) become more important to retain. E.g., I remember the spectral theorem but wish I remembered the sketch of what it looks like to develop the spectral theorem from scratch. Unfortunately, I'm less clear/experienced on using Anki to do this effectively. I think Michael Nielsen's blog post on seeing through a piece of mathematics is a good first step. Deeply internalizing core proofs from an area presumably should help for retaining the core mental moves involved in being effective in that area. But, this is quite time intensive and also prioritizes breadth over depth.

I actually did mention two things that I think may help with retaining concepts and competencies - Anki-izing the same concepts in different ways (often visually) and Anki-izing examples of concepts. I haven't experienced this yet, but I'm hopeful that remembering alternative visual versions of definitions, analogies to them, and examples of them may help with the types of problems where you can see the solution at a glance if you have the right mental model (more common in some areas than others). For example, I remember feeling (usually after agonizing over a problem for a while) like Linear Algebra Done Right had a lot of exercises where the right geometric intuition or representative example would allow you to see the solution relatively quickly and then just have to convert it to words.

Another idea for how to Anki-ize concepts and competencies better that I haven't tried (yet) but will share anyway is succinctly capturing strategies pop up again and again in similar forms. To use another Linear Algebra Done Right example, there are a lot of exercises with solutions of the form "construct some arbitrary linear map that does what we want" and show it... does what we want. I remember this technique but worry that my pattern matching machinery for the types of problems to which it tends to apply has decayed. On the other hand, if I had an Anki card that just listed short descriptions of a few exercises and asked me which technique was core to their solutions, maybe I'd retain that competency better.

comment by NaiveTortoise (An1lam) · 2019-09-02T15:11:09.203Z · score: 16 (5 votes) · LW(p) · GW(p)

Watching my kitten learn/play has been interesting from a "how do animals compare to current AIs perspective?" At a high level, I think I've updated slightly towards RL agents being further along the evolutionary progress ladder than I'd previously thought.

I've seen critiques of RL agents not being able to do long-term planning as evidence for them not being as smart as animals, and while I think that's probably accurate, I have noticed that my kitten takes a surprisingly long time to learn even 2-step plans. For example, when it plays with a toy on a string, I'll often try putting the toy on a chair that it only knows how to reach by jumping onto another chair first. It took many attempts before it learned to jump onto the other chair and then climb to where I'd put the toy, even though it had previously done that while exploring many times. And even then, it seems to be at risk of "catastrophic forgetting" where we'll be playing in the same way later and it won't remember to do the 2-step move. Related to this, its learning is fairly narrow even for basic skills, e.g. I have 4 identical chairs around a table but it will be afraid of jumping onto one even though it's very comfortable jumping onto another.

Now part of this may be that cats are known for being biased towards trial-and-error compared to other similarly-sized mammals like dogs (see Gwern's write-up for more on this) and that adult cats may be better than kittens at "long-term" planning. However, a lot of critiques of RL, such as Josh Tenenbaum's, argue that our AIs don't even compare to young children in terms of their abilities. This is undoubtedly true with respect to ability to actually move around in the world, grasp objects, etc. but seems less true than I'd previously thought with respect to "higher level" cognitive abilities such as planning. To make this more concrete, I'm skeptical that my kitten could currently succeed at a real life analogue to Montezuma's Revenge.

Another thing I've observed relates to some recent work by Konrad Kording, Adam Marblestone, and Greg Wayne on integrating deep learning and neuroscience. They postulate that due to the genomic bottleneck, it's plausible that brains leverage heterogeneous, evolving cost functions to do semi-supervised learning throughout development. While much more work needs to be done investigating this hypothesis (as acknowledged by the authors), it does ring true with some of my observations of my kitten. In particular, I've noticed that it recently became much more interested in climbing things and jumping on objects on its own, whereas previously I couldn't even get it to using treats. This seems like a plausible example of a "switch" being flipped that increased reward for being high up (or something, obviously this is quite hand-wavy).

I'm trying to come up with predictions that I can make regarding the next few months based on these two initial observations but don't have any great ideas yet.

comment by NaiveTortoise (An1lam) · 2019-11-05T14:00:23.575Z · score: 12 (3 votes) · LW(p) · GW(p)

Interesting Bill Thurston quote, sadly from his obituary:

I’ve always taken a “lazy” attitude toward calculations. I’ve often ended up spending an inordinate amount of time trying to figure out an easy way to see something, preferably to see it in my head without having to write down a long chain of reasoning. I became convinced early on that it can make a huge difference to find ways to take a step-by-step proof or description and find a way to parallelize it, to see it all together all at once—but it often takes a lot of struggle to be able to do that. I think it’s much more common for people to approach long case-by-case and step-by-step proofs and computations as tedious but necessary work, rather than something to figure out a way to avoid. By now, I’ve found lots of “big picture” ways to look at the things I understand, so it’s not as hard. To prevent mis-interpretation, I think people often look at quotes like this (I've seen similar ones about Feynman) and think "ah yes, see anyone can do it". But IME the thing he's describing is much harder to achieve than the "case-by-case"/"step-by-step" stuff.

comment by NaiveTortoise (An1lam) · 2019-03-31T23:31:25.586Z · score: 7 (4 votes) · LW(p) · GW(p)

I've recently been obsessing over the idea of trying to "make math more like programming". I'm not sure if it's just because I feel fluent at programming and still not very fluent at abstract math or also because programming really does have a feedback loop that you don't get in math.

Regardless, based on my reading it seems like there's a general consensus in math that even the most modern theorem provers, like Lean and Coq, are much less efficient than typical "informal" math reasoning. That said, I wonder if this ignores some of the benefits that programmers get from writing in a formal language, i.e. automatic refactoring tools, fast feedback loops, and code analysis/search tools. Also, it seems like a sufficiently user-friendly math theorem proving tool could be useful for education. If kids can learn how to program in Javascript, I have to believe they can learn to prove theorems, even if the learning curve's relatively steep.

Maybe once I play around with Lean more, I'll change my mind, but for now, I'm sticking to my contrarian viewpoint.

comment by Pattern · 2019-06-04T20:20:29.749Z · score: 4 (3 votes) · LW(p) · GW(p)

It seems like a useful idea on a lot of levels.

There's a difference between solving a problem where you're 1) trying to figure out what to do. 2) Executing an algorithm. 3) Evaluating a closed form solution (Plugging the values into the equation, performing the operations, and seeing what the number is.)***

Names. If you're writing a program, and you decide to give things (including functions/methods) names like the letters of the alphabet it's hard for other people to understand what you're doing. Including future you. As a math enthusiast I see the benefit of not having to generate names*, but teaching wise? I can see some benefits of merging/mixing. (What's sigma notation? It's a for loop.)

Functions. You can say f' is the derivative of f. Or you can get into the fact that there are functions** that take other functions as arguments. You can focus narrowly on functions of one-variable. Or you can notice that + is a function that takes two numbers (just like *, /, ^).

*Like when your idea of what you're doing /with something changes as you go and there's no refactoring tool on paper to change the names all at the last minute. (Though paper feels pretty nice to work with. That technology is really ergonomic.)

**And that the word function has more than one meaning. There's a bit of a difference between a way of calculating something and a lookup table.

***Also, seeing how things generalize can be easier with tools that can automatically check if the changes you've made have broken what you were making. (Writing tests.)

comment by NaiveTortoise (An1lam) · 2019-10-29T13:58:44.367Z · score: 6 (3 votes) · LW(p) · GW(p)

Blockchain idea inspired by 80,000 Hours's interview of Vitalik Buterin: a lot of podcasts either have terrible transcriptions or presumably pay a service to transcribe their sessions. However, even these services make minor typos such as "ASX" instead of "ASICs" in the linked interview.

Now, most people who read these transcripts presumably notice at least a subset of these typos but don't want to go through the effort of emailing podcasters to tell them about it. On the flip side, there's no good way for hosts to scalably audit flagged typos to see if they're actually typos. What we really want is a mostly automated mechanism to aggregate flagged typos and accept fixes which multiple people agree upon that only pays people (micro amounts) for correctly identifying typos.

This mechanism seems like something that could live on a blockchain in some sort of smart contract. Obviously, like almost every blockchain application, you *could* do it without a blockchain, but using blockchain makes it easy to audit and distribute the micro-payments rather than having to program the voting scheme from scratch on top of a centralized database.

comment by NaiveTortoise (An1lam) · 2020-01-26T18:56:09.057Z · score: 5 (4 votes) · LW(p) · GW(p)

It seems like (unless I just haven't discovered it yet) there's a sore need for a framework for causal model comparison, analogous to Bayesian model comparison. If you read Pearl (and his students), they rightfully point out that you can't get causal claims without causal assumptions but don't talk much about how you actually formulate the causal model in the first place ("domain knowledge"). As a result, if you look at the literature, researchers mostly seem to use a small set of causal models that may or may not describe phenomena, e.g. the classic "instrumental variable" graph, for inference.

I view this as analogous to selecting a prior in applied Bayesian modeling. However, there there's a nice set of tools [LW · GW] for comparing how likely different models are, whereas I'm not aware of any such thing in the causal inference world. There's something called "sensitivity analysis" but that's about how much deviation from your assumptions affects your conclusions.

comment by NaiveTortoise (An1lam) · 2020-01-27T15:14:03.598Z · score: 1 (1 votes) · LW(p) · GW(p)

I forgot to include the disclaimer besides statistical independence tests, which can invalidate graphs but are difficult in practice.

comment by NaiveTortoise (An1lam) · 2019-09-15T17:24:05.749Z · score: 5 (3 votes) · LW(p) · GW(p)

Epistemic status: Thinking out loud.

Introducing the Question

Scientific puzzle I notice I'm quite confused about: what's going on with the relationship between thinking and the brain's energy consumption?

On one hand, I'd always been told that thinking harder sadly doesn't burn more energy than normal activity. I believed that and had even come up with a plausible story about how evolution optimizes for genetic fitness not intelligence, and introspective access is pretty bad as it is, so it's not that surprising that we can't crank up our brains energy consumption to think harder. This seemed to jive with the notion that our brain's putting way more computational resources towards perceiving and responding to perception than abstract thinking. It also fit well with recent results calling ego depletion into question and into the framework in which mental energy depletion is the result of a neural opportunity cost calculation [LW · GW].

Going even further, studies like this one left me with the impression that experts tended to require less energy to accomplish the same mental tasks as novices. Again, this seemed plausible under the assumption that experts brains developed some sort of specialized modules over the thousands of hours of practice they'd put in.

I still believe that thinking harder doesn't use more energy, but I'm now much less certain about the reasons I'd previously given for this.

Chess Players' Energy Consumption

This recent ESPN (of all places) article about chess players' energy consumption during tournaments has me questioning this story. The two main points of the article are:

  1. Chess players burn a lot of energy during tournaments, potentially on the order of 6000 calories a day (that's about what marathon runners burn in a day). This results from intense mental stress leading to an elevated heart rate and, as a result, increased oxygen consumption. Chess players also tend to eat less during competitions, which also contributes to weight loss during tournaments (apparently Karpov once lost 20 pounds during an extended chess championship).
  2. Chess players and their coaches now understand that humans aren't Cartesian, i.e. our physical health impacts our cognitive performance, and have responded accordingly with intense physical training regimens. On the surface, none of this contradicts the claims I cited above. The article's claiming that chess players burn more energy purely from the side effects of stress, not because their brains are doing more work. So why am I revisiting this question?

Gaps in the Evolutionary Justification

First, reading the chess article led me to notice a big gap in the explanation I gave above for why we shouldn't expect a connection between thinking hard and energy consumption. In my explanation, I mentioned that we should expect our brains to spend much more energy on perceptive and reactive processing than on abstract thinking. This still makes sense to me as a general claim about the median mammal, but now seems less plausible to me as it relates to humans specifically. This recent study, for example, provides evidence that our (humans) big brains are one of two primary causes for our increased energy consumption compared to other primates. As far as I can tell, humans don't seem to have meaningfully better coordination or perceptive abilities than chimps. Chimps have opposable thumbs and big toes, spend their days picking bugs off of each other, and climbing trees. Given this, while I admittedly haven't looked into studies on this but I find it hard to imagine that human brains spend much more energy than chimps on perception.

Let's say that we put aside the question of what exactly human brains use their extra energy for and bucket it into the loose category of "higher mental functions". This still leaves me with a relevant question, why didn't brains evolve to use varying amounts of energy depending on what they were doing? In particular, if we assume that humans are the first and only mammals that spend large fractions of their calories on "extra" brain functions, then why wasn't there selection pressure to have those functions only use energy when they were needed instead of all the time?

Bringing things back to my original point, in my initial story, thinking didn't impact energy consumption because our brains spend most of their energy on other stuff anyway, so there wasn't strong selective pressure to connect thinking intensity to energy consumption. However, I've just given some evidence that "higher brain functions" actually did come with a significant energy cost, so we might expect that those functions' energy consumption would in fact be context-dependent.

Second, it's weird that what we're doing (mentally) can so dramatically impact our energy consumption due to elevated heart rate and other stress-triggered adaptations but has no impact on the energy our brain consumes. To be clear, it makes sense that physical activity and stress would be intimately connected as this connection is presumably very important for balancing the need to eat/escape predators with the need to not use too much energy when sitting around. One doesn't yet make sense to me is that, even though neurons evolved from the same cells as all the rest of our biology, they proved so resistant to optimization for variable energy consumption.

Rescuing the Original Hypothesis

The best explanation I can come up with for the two puzzles I just discussed is that, for whatever reason, evolution didn't select for a neural architecture that could selectively up- and down-regulate its energy consumption depending on the circumstances. For example, maybe the fact that neurons die when they don't have energy is somehow intimately coupled with their architecture such that there's no way to fix it short of something only a goal-directed consequentialist (and therefore not a hill-climbing process) could accomplish. If this is true, even though humans plausibly would've benefited at some point during our evolutionary history from being able to spend more or less energy on thinking, we shouldn't be surprised never happened.

Another weaker (IMO) explanation is that human brains do use more energy in certain situations for some "higher mental functions" but it's not the situations you'd expect. For example, maybe humans use a ton of energy for social cognition and if we could measure the neocortex's energy consumption during parties, we'd find it uses a lot more energy than usual.

comment by eigen · 2019-10-23T23:57:14.905Z · score: 1 (1 votes) · LW(p) · GW(p)

The ESPN article had a misleading title. They go on to say that a player burns 6000 calories a day , but Caruana runs an hour a day (or more). These Grandmasters are not reaching into some esoteric mental ability and burning more calories that way; if anyone has ever seen a Grandmaster play against many players at once, or blindfolded (or even blindfolded and against many players!) one can really understand that they see the board in a way that's pretty different from us.

The classical theory for this is that they have formed bigger/better chunks than us from excessive playing (the very same way a Mathematician or a Basketball player does). Calorie consumption, is thus correlation in that specific context.

Although, I think, a (weak) connection could be made between the use of Language and these chunks formations or using this chunks (who's to say this is not a specialized use of Language?) for the context of a tournament, but I have yet to see anything that support this idea.




comment by NaiveTortoise (An1lam) · 2019-10-24T00:05:47.284Z · score: 1 (1 votes) · LW(p) · GW(p)

My takeaway from the article was that, to your point, their brains weren't using more energy. Rather, the best hypothesis was just that their adrenal hormones remained elevated for many hours of the day, leading to higher metabolism during that period. Running an hour a day is definitely not enough to burn 6000 calories for the record (a marathon burns around 3500).

Maybe I wasn't clear, but that's what I meant by the following.

The ar­ti­cle’s claiming that chess play­ers burn more en­ergy purely from the side effects of stress, not be­cause their brains are do­ing more work. So why am I re­vis­it­ing this ques­tion?

comment by eigen · 2019-10-24T00:16:53.721Z · score: 1 (1 votes) · LW(p) · GW(p)

Got it! then I agree with you. I think that a best description of my point would be that yeah, these guys are not burning calories by thinking better or harder. Their exercise plus the higher stress environment could account alone for their high amount burn of calories.

comment by NaiveTortoise (An1lam) · 2019-09-22T20:07:22.944Z · score: 4 (3 votes) · LW(p) · GW(p)

ML-related math trick: I find it easier to imagine a 4D tensor, say of dimensions , as a big matrix with dimensions within which are nested matrices of dimensions . The nice thing about this is, at least for me, it makes it easier to imagine applying operations over the matrices in parallel, which is something I've had to thing about a number of times doing ML-related programming, e.g. trying to figure out how write the code to apply a 1D convolution-like operation to an entire batch in parallel.

comment by crabman · 2019-09-23T18:08:23.164Z · score: 1 (1 votes) · LW(p) · GW(p)

I've been studying tensor decompositions and approximate tensor formats for half a year. Since I've learned about tensor networks, I've noticed that I can draw them to figure out how to code some linear operations on tensors.

Once I used this to figure out how to implement backward method of some simple neural network layer (not something novel, it was for the sake of learning how deep learning frameworks work). Another time I needed to figure out how to implement forward method for a Conv2d layer with weights tensor in CP format. After drawing its output as a tensor network diagram, it was clear that I could just do a sequence of 3 Conv2d layers: pointwise, depthwise, pointwise.

I am not saying that you should learn tensor networks, it's probably a lot of buck for not too large bang unless you want to work with tensor decompositions and formats.

comment by NaiveTortoise (An1lam) · 2019-09-23T19:43:19.590Z · score: 1 (1 votes) · LW(p) · GW(p)

From cursory Googling, it looks like tensor networks are mostly used for understanding quantum systems. I'm not opposed to learning about them, but is there a good resource you can point me to that introduces them independent of the physics concepts? Were you learning them for use in physics?

For example, have you happened to read this Google AI paper introducing their TensorNetworks library and giving an overview?

comment by crabman · 2019-09-23T20:57:39.312Z · score: 1 (1 votes) · LW(p) · GW(p)

Unfortunately I don't know any quantum stuff. I learned them for machine learning purposes.

A monograph by Cichocki et al. (part 1, part 2) is an overview of how tensor decompositions, tensor formats, and tensor networks can be used in machine learning and signal processing. I think it lacks some applications, including acceleration and compression of neural networks by compression of weights of layers using tensor decompositions (this also sometimes improves accuracy, probably by reducing overfit).

Tensor decompositions and Applications by Kolda, Bader 2009 - this is an overview of tensor decompositions. It doesn't have many machine learning applications. Also it doesn't talk of tensor networks, only about some simplest tensor decompositions and specific tensor formats which are the most popular types of tensor networks. This paper was the first thing I read about all the tensor stuff, and it's one of the easier things to read. I recommend you read it first and then look at the topics that seem interesting to you in Cichocki et al.

Tensor spaces and numerical tensor calculus - Hackbusch 2012 - this textbook covers mathematics of tensor formats and tensor decompositions for hilbert and banach spaces. No applications, a lot of math, functions analysis is kinda a prerequisite. Very dense and difficult to read textbook. Also doesn't talk of tensor networks, only about specific tensor formats.


Handwaving and interpretive dance - This is simple, it's about tensor networks, not other tensor stuff. It's for physicists but chapter 1 and maybe other chapters can be read without physics background.


Regarding the TensorNetwork library. I've skim-read it. I haven't tried using it. I think it's in early alpha or something. How usable it is for me depends on how well it can interact with pytorch and how easy it is to do autodifferentiation w.r.t. core tensors and use the tensor network in a pytorch model. Intuitively the API seemed nice. I think their idea is to that you take a tensor, make it into a matrix, do truncated svd, now you have 2 matrices, turn them back to tensors. Now you do the same for them. This way you can perform some but not all popular tensor decomposition algorithms.

P.S. Fel free to message me if you have questions about tensor decomposition/network/formats stuff

comment by NaiveTortoise (An1lam) · 2019-12-23T19:38:56.915Z · score: 3 (4 votes) · LW(p) · GW(p)

(Removed.)

comment by eigen · 2019-12-23T23:55:50.351Z · score: 7 (3 votes) · LW(p) · GW(p)

*writing the movie right now*

Relevant here: https://www.lesswrong.com/posts/bshZiaLefDejvPKuS/dying-outside [LW · GW]

comment by agai · 2019-12-27T21:18:37.503Z · score: -1 (1 votes) · LW(p) · GW(p)

Comment removed for posterity.

comment by [deleted] · 2019-12-24T08:02:58.409Z · score: 0 (4 votes) · LW(p) · GW(p)

I have reported this comment. Hopefully the mods will remove it.

Please don’t speculate on the identity of Satoshi, or spread speculation by others. It has led in multiple cases to people being stalked, blackmailed, harassed, and mugged. Posts like this put innocent lives in physical danger. Be responsible and keep this sort of thing off the Internet.

comment by NaiveTortoise (An1lam) · 2019-12-24T14:37:17.555Z · score: 3 (5 votes) · LW(p) · GW(p)

(Note: responded quickly before removing. I've since edited this comment now that I have more time. Also I'm not the person who downvoted your post.)

I definitely did not intend to cause anyone or their family danger (or harassment, etc.), so I've removed the post.

Mostly in the selfish interest of showing that I wasn't being negligent, I did consider this risk before posting. That's why I noted that I have no information beyond what's already public and was taking into account that since I heard this speculation on a podcast which involved one relatively prominent cryptocurrency person (I won't say who so as not to publicize it further), it seemed unlikely that my post would add additional noise.

All that said, I still agree that even a small chance of harm is more than enough reason to remove the post. Especially, since:

  1. it seems like you're more involved in the crypto community than I and therefore probably have more context than I do on this topic; and
  2. my own version of integrity includes not doing things that only don't cause bad outcomes because they're obscure (related to my second point above).
comment by [deleted] · 2019-12-25T01:40:16.240Z · score: 11 (4 votes) · LW(p) · GW(p)

Thank you. Yes it is a real problem, speaking from experience from the people I personally know. The reason these events are not talked about much is that any press just makes the problem worse—it gives a bunch of copycat muggers the same bright idea. So unfortunately you get a bunch of speculation and not a lot of observable evidence of the downsides of that speculation, so people don’t realize the harm that has been caused.

There are people who have been killed in attempted bitcoin muggings. Speculating on the Internet that someone is possession of >1 million bitcoins is like tattooing a big target on their back they can’t get rid of.

comment by NaiveTortoise (An1lam) · 2019-12-25T02:00:22.621Z · score: 2 (2 votes) · LW(p) · GW(p)

Thanks, that helps contextualize.

comment by eigen · 2019-12-24T16:32:11.345Z · score: 3 (1 votes) · LW(p) · GW(p)

For the record I'm one who downvoted Mark; I don't agree with him and I think it sad that you, an1lam, removed the original post which I don't think did any harm whatsoever (reasons should be pretty obvious, a random short-form post about an hypothetical movie somehow it's evidence that Hal was Satoshi? I do not think so at all.)

comment by [deleted] · 2019-12-25T01:34:04.029Z · score: 3 (2 votes) · LW(p) · GW(p)

The risk to innocents is real. Physical security is a really hard problem for people in this space, and the police won’t protect those at risk. Does one post on one rationalist website really matter? Yes, for the same reason your vote matters at the ballot box. This is the collective action problem. If nobody self-censors a statement that puts people at risk, the risks only increase over time and those who help propagate the info are morally culpable.

comment by NaiveTortoise (An1lam) · 2019-12-10T12:45:32.897Z · score: 3 (2 votes) · LW(p) · GW(p)

Weird thought I had based on a tweet about gradient descent in the brain: it seems like one under-explored perspective on computational graphs is the causal one. That is, we can view propagating gradients through the computational graph as assessing the effect of an intervention on some variable on all of a nodes' children.

Reason to think this might be useful:

  • *Maybe* this can act as a different lens for examining NN training?

Reasons why this might not be useful:

  • It's not obvious that it makes sense to think of nodes in an NN (or any differential computational graph) as causally related in the sense we usually talk about in causal inference.
  • A causal interpretation of gradients isn't obvious because they're so local, whereas most causal inference focuses on the impact of more non-trivial interventions. OTOH, there are some NN interpretability techniques that try to solve this, so maybe these have better causal interpretations?
comment by NaiveTortoise (An1lam) · 2019-10-24T19:57:41.166Z · score: 3 (2 votes) · LW(p) · GW(p)

If algebra's a deal with the devil where you get the right answer but don't know why, then geometric intuition's a deal with the devil where you always get an answer but don't know whether it's right.

comment by NaiveTortoise (An1lam) · 2019-10-24T19:35:35.608Z · score: 3 (3 votes) · LW(p) · GW(p)

Someone should write the equivalent of TAOCP for machine learning.

(Ok, maybe not literally the equivalent. I mean Knuth is... Knuth. So it doesn't seem realistic to expect someone to do something as impressive as TAOCP. And yes, this is authority worship and I don't care. He's Knuth goddamn it.)

Specifically, a book where the theory/math's rigorous but the algorithms are described in their efficient forms. I haven't found this in the few ML books I've read parts of (Bishop's Pattern Recognition and Machine Learning, MacKay's Information Theory, and Tibrishani et Al's Elements of Statistical Learning), so if it's already out there, let me know.

Note that I don't mean that whoever does this should do the whole MMIX thing and write their own language and VM.

comment by NaiveTortoise (An1lam) · 2019-09-27T03:19:38.809Z · score: 3 (3 votes) · LW(p) · GW(p)

Today I attended the first of two talks in a two-part mini-workshop on Variational Inference. It's interesting to think of from the perspective of my recent musings about more science-y vs. engineering mindsets because it highlighted the importance of engineering/algorithmic progress in widening Bayesian methods' applicability

The presenter, who's a fairly well known figure in probabilistic ML and has developed some well known statistical inference algorithms, talked about how part of the reason so much time was spent debating philosophical issues in the past was because Bayesian inference wasn't computationally tractable until the development of Gibbs Sampling in the '90s by Gelfand & Smith.

To be clear, the type of progress I'm talking about is still "scientific" in the sense of it mostly involves applied math and finding good ways to approximate posterior distributions. But, it's "engineering" in the sense that it's the messy sort of work I talked about in my other post, where messy means a lot of the methods don't have a good theoretical backing and involve making questionable (at least ex ante) statistical assumptions. Now, the counter is of course that we don't have a theoretical backing yet, but there still may be one in the future.

I'll probably have more to say about this when the workshop's over but I partly just wanted to record my thoughts while they were fresh.

comment by NaiveTortoise (An1lam) · 2019-12-13T21:35:30.350Z · score: 2 (2 votes) · LW(p) · GW(p)

Link post for a short post I just published describing my way of understanding Simpson's Paradox.

comment by NaiveTortoise (An1lam) · 2019-10-14T14:50:40.298Z · score: 2 (2 votes) · LW(p) · GW(p)

Thing I desperately want: tablet native spaced repetition software that lets me draw flashcards. Cloze deletions are just boxes or hand-drawn occlusions.