An1lam's Short Form Feed

post by An1lam · 2018-08-11T18:33:15.983Z · score: 14 (3 votes) · LW · GW · 26 comments

In light of reading Hazard's Shortform Feed [LW · GW] -- which I really enjoy -- based on Raemon's Shortform feed, I'm making my own. There be thoughts here. Hopefully, this will also get me posting more.

26 comments

Comments sorted by top scores.

comment by An1lam · 2019-09-02T15:11:09.203Z · score: 16 (5 votes) · LW · GW

Watching my kitten learn/play has been interesting from a "how do animals compare to current AIs perspective?" At a high level, I think I've updated slightly towards RL agents being further along the evolutionary progress ladder than I'd previously thought.

I've seen critiques of RL agents not being able to do long-term planning as evidence for them not being as smart as animals, and while I think that's probably accurate, I have noticed that my kitten takes a surprisingly long time to learn even 2-step plans. For example, when it plays with a toy on a string, I'll often try putting the toy on a chair that it only knows how to reach by jumping onto another chair first. It took many attempts before it learned to jump onto the other chair and then climb to where I'd put the toy, even though it had previously done that while exploring many times. And even then, it seems to be at risk of "catastrophic forgetting" where we'll be playing in the same way later and it won't remember to do the 2-step move. Related to this, its learning is fairly narrow even for basic skills, e.g. I have 4 identical chairs around a table but it will be afraid of jumping onto one even though it's very comfortable jumping onto another.

Now part of this may be that cats are known for being biased towards trial-and-error compared to other similarly-sized mammals like dogs (see Gwern's write-up for more on this) and that adult cats may be better than kittens at "long-term" planning. However, a lot of critiques of RL, such as Josh Tenenbaum's, argue that our AIs don't even compare to young children in terms of their abilities. This is undoubtedly true with respect to ability to actually move around in the world, grasp objects, etc. but seems less true than I'd previously thought with respect to "higher level" cognitive abilities such as planning. To make this more concrete, I'm skeptical that my kitten could currently succeed at a real life analogue to Montezuma's Revenge.

Another thing I've observed relates to some recent work by Konrad Kording, Adam Marblestone, and Greg Wayne on integrating deep learning and neuroscience. They postulate that due to the genomic bottleneck, it's plausible that brains leverage heterogeneous, evolving cost functions to do semi-supervised learning throughout development. While much more work needs to be done investigating this hypothesis (as acknowledged by the authors), it does ring true with some of my observations of my kitten. In particular, I've noticed that it recently became much more interested in climbing things and jumping on objects on its own, whereas previously I couldn't even get it to using treats. This seems like a plausible example of a "switch" being flipped that increased reward for being high up (or something, obviously this is quite hand-wavy).

I'm trying to come up with predictions that I can make regarding the next few months based on these two initial observations but don't have any great ideas yet.

comment by An1lam · 2019-09-02T17:00:53.243Z · score: 15 (9 votes) · LW · GW

Cruxes I Have With Many LW Readers

There's a crux I seem to have with a lot of LWers that I've struggled to put my finger on for a long time but I think reduces to some combination of:

  • faith in elegance vs. expectation of messiness;
  • preference for axioms vs. examples;
  • identification as primarily a scientist/truth-seeker vs. as an engineer/builder.

I tend to be more inclined towards the latter in each case, whereas I think a lot of LWers are inclined towards the former, with the potential exception of the author of realism about rationality [LW · GW], who seems to have opinions that overlap with many of my own. While I still feel uncomfortable with the above binaries, I've now gathered enough examples to at least list them as evidence for what I'm talking about.

Example 1: Linear Algebra Textbooks

A few [LW · GW] LWers [LW · GW] have positively reviewed Linear Algebra Done Right (LADR), in particular complimenting it for revealing the inner workings of Linear Algebra. I too recently read most of this book and did a lot of the exercises. And... I liked it but seemingly less than the other reviewers. In particular, I enjoyed getting a lot of practice reading definition-theorem-proof style math and doing lots of proofs myself, but found myself wishing for more examples and discussion of how to compute things like eigenvalues in practice. While I know that's not what the book's about, the difference I'm pointing to is more that I found the omission of these things bothersome, whereas I suspect the other reviewers were happy with the focus on constructing the different objects mathematically (I'm also obviously making some assumptions here).

On the other hand, I've recently been reading sections of Shilov's Linear Algebra, which is more concrete but does more ugly stuff like present the determinant very early on, and I feel like I'm learning better from it.

I think one contributing factor towards this preference difference is that I tend to be more OK with unmotivated messiness if the messy thing is clearly useful for something but less OK slogging through a bunch of elegant but not-clear-what-it's-used-for build up. Another way to put this would be that I tend to like to get top-down view of a subject and then go depth-first afterwards, whereas others seem happy to learn bottom-up. I used to think this was because of my experience with programming where algorithms are pretty much always presented in
terms of their purpose and tend to be become messier as they get optimized for performance. I still like knowing the motivation for things, but I also accept that stuff that works for real applications often has a bunch of messiness. On the other hand, a lot of LWers are also programmers who are only now going deep on math and they seem to still be happy with the axiomatic math way of doing things. So having a programming background doesn't seem to correlate with my preferences that strongly...

What would be great would be if someone would chime in providing better hypotheses/explanations than the one I've given.

Example 2: Scientists vs. Engineers as Role Models

Much of early LW content, the Sequences in particular, used scientists like Einstein and Feynman as role models in discussions (and also targets of criticism in fairness). While I love Feynman and Einstein too, I tend to also revere builders/engineers, such as John Carmack, Jeff Dean, and Konrad Zuse, but these types of people don't seem to get nearly as much praise on LW.

One explanation for this is that great but not necessarily thoughtful engineers can drive X-risk through their work. For example, here's [LW · GW] a discussion where a few folks argue that AGI requires insight more than programming ability and explicitly mention needing Judea Pearl more than John Carmack. While this is a fair argument, I'm skeptical that it's the true rejection. Security mindset seems to be as common among engineers as it is among scientists given that most of the folks who participate in things like DefCon and work in computer security tend to be hardcore engineer types like Trammell Hudson. (In his original essay, Eliezer cites Bruce Schneier, definitely an engineer, as someone he trusts to have security mindset.)

Another potential explanation for this is that LW readers tend to like doing and learning about science (pure math included) more than doing engineering. It's plausible that people who were attracted to early LW/OB content and were compelled by arguments for X-risk tend to also prefer science to engineering.

Conclusion

Unfortunately, I don't have some sort of nice insight to conclude this with. I don't think the differences between my and other LWers preferences are bad so much as an implicit thing that doesn't get discussed.

I am curious whether my dichotomies seem reasonably accurate to anyone reading this? And if so, do my hypotheses for them seem reasonable?

comment by mr-hire · 2019-09-02T18:08:20.110Z · score: 12 (4 votes) · LW · GW

I have similar differences with many people on LW and agree there is something of an unacknowledged aesthetic here.

comment by jimrandomh · 2019-09-10T01:05:42.962Z · score: 4 (3 votes) · LW · GW

I think the engineer mindset is more strongly represented here than you think, but that the nature of nonspecialist online discussion warps things away from the engineer mindset and towards the scientist mindset. Both types of people are present, but the engineer-mindset people tend not to put that part of themselves forward here.

The problem with getting down into the details is that there are many areas with messy details to get into, and it's hard to appreciate the messy details of an area you haven't spent enough time in. So deep dives in narrow topics wind up looking more like engineer-mindset, while shallow passes over wide areas wind up looking more like scientist-mindset. LessWrong posts can't assume much background, which limits their depth.

I would be happy to see more deep-dives; a lightly edited transcript of John Carmack wouldn't be a prototypical LessWrong post, but it would be a good one. But such posts are necessarily going to exclude a lot of readers, and LessWrong isn't necessarily going to be competitive with posting in more topic-specialized places.

comment by An1lam · 2019-09-10T02:33:40.011Z · score: 3 (2 votes) · LW · GW

These are all good points.

After I saw that Benito did a transcript post, I considered doing one for one of Carmack's talks or a recent interview of Yann LeCunn I found pretty interesting (based on the talks of his I've listened to, LeCunn has a pretty engineering-y mindset even though he's nominally a scientist). Not going to happen immediately though since it requires a pretty big time investment.

Alternatively, maybe I'll review Masters of Doom, which is where I learned most of what I know about Carmack.

comment by Pattern · 2019-09-03T00:12:13.320Z · score: 3 (2 votes) · LW · GW
What would be great would be if someone would chime in providing better hypotheses/explanations than the one I've given.

As the dichotomy isn't jumping out at me, I guess I should read both of those books* sometime and see which I like more.

*Linear Algebra Done Right (LADR)

Shilov's Linear Algebra

comment by Ruby · 2019-09-02T17:29:22.642Z · score: 3 (6 votes) · LW · GW

This is really interesting, I'm glad you wrote this up. I think there's something to it.

Some quick comments:

  • I generally expect there to exist simple underlying principles in most domains which give rise to messiness (and often the messiness seems a bit less messy once you understand them). Perceiving "messiness" does also often feel to me like lack of understanding whereas seeing the underlying unity makes me feel like I get whatever the subject matter is.
  • I think I would like it if LessWrong had more engineers/inventors as role models and that it's something of an oversight that we don't. Yet I also feel like John Carmack probably probably isn't remotely near the level of Pearl (I'm not that familiar Carmack's work): pushing forward video game development doesn't compare to neatly figuring what exactly causality itself is.
    • There might be something like all truly monumental engineering breakthroughs depended on something like a "scientific" breakthrough. Something like Faraday and Maxwell figuring out theories of electromagnetism is actually a bigger deal than Edison(/others) figuring out the lightbulb, the radio, etc. There are cases of lauded people who are a little more ambiguous on the science/engineer dichotomy. Turing? Shannon? Tesla? Shockley et al with the transistor seems kind of like an engineering breakthrough, and seems there could be love for that. I wonder if Feynman gets more recognition because as an educator we got a lot more of the philosophy underlying his work. Just rambling here.
  • A little on my background: I did an EE degree which was very practical focus. My experience is that I was taught how to do apply a lot of equations and make things in the lab, but most courses skimped on providing the real understanding that left me overall worse as an engineer. The math majors actually understood Linear Algebra, the physicists actually understood electromagnetism, and I knew enough to make some neat things in the lab and pass tests, but I was worse off for not having a deeper "theoretical" understanding. So I feel like I developed more of an identity as a engineer, but came to feel that to be a really great engineer I needed to get the core science better*.

*I have some recollection that Tesla could develop a superior AC electric system because he understood the underlying math better than Edison, but this is a vague recollection.

comment by jimrandomh · 2019-09-10T00:36:57.021Z · score: 8 (3 votes) · LW · GW
Yet I also feel like John Carmack probably probably isn't remotely near the level of Pearl (I'm not that familiar Carmack's work): pushing forward video game development doesn't compare to neatly figuring what exactly causality itself is.

You're looking at the wrong thing. Don't look at the topic of their work; look at their cognitive style and overall generativity. Carmack is many levels above Pearl. Just as importantly, there's enough recorded video of him speaking unscripted that it's feasible to absorb some of his style.

comment by Ruby · 2019-09-12T01:54:31.874Z · score: 2 (1 votes) · LW · GW
You're looking at the wrong thing. Don't look at the topic of their work; look at their cognitive style and overall generativity.

By generativity do you mean "within-domain" generativity?

Carmack is many levels above Pearl.

To unpack which "levels" I was grading on, it's something like a blend of "importance and significance of their work" / "difficulty of the problems they were solving", admittedly that's still pretty vague. On those dimensions, it seems entirely fair to compare across topics and assert that Pearl was solving more significant and more difficult problem(s) than Carmack. And for that "style" isn't especially relevant. (This can also be true even if Carmack solved many more problems.)

But I'm curious about your angle - when you say that Carmack is many levels above Pearl, which specific dimensions is that on (generativity and style?) and do you have any examples/links for those?

comment by jimrandomh · 2019-09-12T02:01:17.861Z · score: 2 (1 votes) · LW · GW
By generativity do you mean "within-domain" generativity?

Not exactly, because Carmack has worked in more than one domain (albeit not as successfully; Armadillo Aerospace never made orbit.)

On those dimensions, it seems entirely fair to compare across topics and assert that Pearl was solving more significant and more difficult problem(s) than Carmack

Agree on significance, disagree on difficulty.

comment by mr-hire · 2019-09-06T19:33:57.290Z · score: 5 (3 votes) · LW · GW
There might be something like all truly monumental engineering breakthroughs depended on something like a "scientific" breakthrough. Something like Faraday and Maxwell figuring out theories of electromagnetism is actually a bigger deal than Edison(/others) figuring out the lightbulb, the radio, etc. There are cases of lauded people who are a little more ambiguous on the science/engineer dichotomy. Turing? Shannon? Tesla? Shockley et al with the transistor seems kind of like an engineering breakthrough, and seems there could be love for that. I wonder if Feynman gets more recognition because as an educator we got a lot more of the philosophy underlying his work. Just rambling here.

TRIZ is an engineering discipline that has something called the five levels of innovation, which talks about this:

1. You solve a problem by using a common solution in your own speciality.

2. You solve a problem using a common solution i your own industry.

3. You solve a problem using a common solution found in other industries.

4. You solve a problem using a solution built on first principles (e.g. little known scientific principles.)

5. You solve a problem by discovering a new principle/scientific rule.

comment by Ruby · 2019-09-12T01:46:40.283Z · score: 2 (1 votes) · LW · GW

Seems you're referring to this https://en.wikipedia.org/wiki/TRIZ?

comment by mr-hire · 2019-09-12T01:58:44.719Z · score: 2 (1 votes) · LW · GW

Yes.

comment by An1lam · 2019-09-02T18:34:04.009Z · score: 2 (2 votes) · LW · GW

Thanks for your reply! I agree with a lot of what you said.

First off, thanks for bringing up the point about underlying principles. I agree that there are often underlying principles in many domains and that I also really like to find unity in seeming messiness. I used to be of the more extreme view that principles were in some sense more important than the details, but I've become more skeptical over time for two reasons.

  1. From a pedagogy perspective, I've personally never had much luck learning principles without having a strong base of practice & knowledge. That said, when I have that base, learning principles helps me improve further and is satisfying.
  2. I've realized over time how much of action (where action can include thinking) is based upon a set of non-verbal strategies that one learns through practice and experimentation even in seemingly theoretical domains. These strategies seem to be the secret sauce that allow one to act fluently but seem meaningfully different from the types of principles people often discuss.

Another way to phrase my argument is that principles are important but very hard to transfer between minds. It's possible you agree and I'm just belaboring the point but I wanted to make it explicit.

One concrete example of the distinction I'm drawing is something called the "What Are Monads Fallacy" in the Haskell community where people try to explain monads by conveying their understanding of what mondas really are even though they learned about monads by just using them a bunch which lead to them later developing a higher level understanding of them. This reflects a more general problem where experts often struggle to teach to novices because they don't realize that their broad understanding is actually founded upon lower level understanding of a lot of details.

I think I would like it if LessWrong had more engineers/inventors as role models and that it's something of an oversight that we don't. Yet I also feel like John Carmack probably probably isn't remotely near the level of Pearl (I'm not that familiar Carmack's work): pushing forward video game development doesn't compare to neatly figuring what exactly causality itself is.

I tentatively agree, but it's pretty hard to draw comparisons. From an insight perspective, I agree that Pearl's work on Bayes Nets and Causality were probably more profound that anything Carmack came up with. From an economic perspective though, Carmack had a massive, albeit indirect, impact on the trajectory of the computing world. By coming up with new algorithms and techniques for 3D game rendering at a time when people had basically no idea how to render 3D games in realtime, Carmack drove the gaming industry forward, which certainly contributed to development of better GPUs and processors as well. Carmack was also the person at Id who insisted on making their games moddable and releasing their game engines, which eventually lead to the development of games like Half-Life.

That said, a better point of comparison to Pearl is probably Jeff Dean, who, in close collaboration with Sanjay Ghemawat, first wrote much of Google's search stack from scratch after it starting failing to scale and then subsequently invented BigTable, MapReduce, Spanner, and Tensorflow!

There might be something like all truly monumental engineering breakthroughs depended on something like a "scientific" breakthrough. Something like Faraday and Maxwell figuring out theories of electromagnetism is actually a bigger deal than Edison(/others) figuring out the lightbulb, the radio, etc. There are cases of lauded people who are a little more ambiguous on the science/engineer dichotomy. Turing? Shannon? Tesla?

Agree that science tends to be upstream of later technology developments, but I would emphasize that there are probably cases where without great engineers, the actual applications never get built. For example, there was a large gap between us understanding genes fairly well and being able to sequence and, more recently, synthesize them.

Shockley et al with the transistor seems kind of like an engineering breakthrough, and seems there could be love for that.

I agree with this and would add Lynn Conway, who invented VLSI, one of the key enablers of the modern processor industry and Moore's Law.

A little on my background: I did an EE degree which was very practical focus. My experience is that I was taught how to do apply a lot of ehttps://www.lesswrong.com/shortformquations and make things in the lab, but most courses skimped on providing the real understanding that left me overall worse as an engineer.

To be clear, I shared this frustration with the engineering curriculum. I started as a Computer Engineering major and switched to CS because I felt like engineering was just a bag of unmotivated tricks whereas in CS you could understand why things were the way they were. However, part of the reason I liked CS's theory was because it was presented in the context of understanding algorithms.

As a final point, I don't think I did a good job of my original post of emphasizing that I'm pro-understanding and pro-theory! I mostly endorse the saying, "nothing is so practical as a good theory." My perceived disagreement is more around how much I trust/enjoy theory for its own sake vs. with an eye towards practice.

comment by Ruby · 2019-09-12T02:23:23.216Z · score: 2 (1 votes) · LW · GW

Sorry for the delayed reply on this one.

I do think we agree on rather a lot here. A few thoughts:

1. Seems there are separate questions of "how you model/role-models and heroes/personal identity" and separate questions of pedagogy.

You might strongly seek unifying principles and elegant theories but believe the correct way to arrive at these and understand these is through lots of real-world messy interactions and examples. That seems pretty right to me.

2. Your examples in this comment do make me update on the importance of engineering types and engineering feats. It makes me think that indeed LessWrong too much focuses only on heroes of "understanding" when there are heroes "of making things happen" which is rather a key part of rationality too.

A guess might be that this is down-steam of what was focused on in the Sequences and the culture that set. If I'm interpreting Craft and the Community [LW · GW] correctly, Eliezer never saw the Sequences as covering all of rationality or all of what was important, just his own particular sub-art that he created in the course of trying to do one particular thing.

That's my dream—that this highly specialized-seeming art of answering confused questions, may be some of what is needed, in the very beginning, to go and complete the rest.

Seemingly answering is confused questions is more science-y than engineering-y and would place focus on great scientists like Feynman. Unfortunately, the community has not yet supplemented the Sequences with the rest of the art of human rationality and so most of the LW culture is still downstream of the Sequences alone (mostly). Given that, we can expect the culture is missing major key pieces of what would be the full art, e.g. whatever skills are involved in being Jeff Dean and John Carmack.

My perceived disagreement is more around how much I trust/enjoy theory for its own sake vs. with an eye towards practice.

About that you might be correct. Personally, I do think I enjoy theory even without application. I'm not sure if my mind secretly thinks all topics will find their application, but having applications (beyond what is needed to understand) doesn't feel key to my interest, so something.

comment by An1lam · 2019-09-12T19:02:29.950Z · score: 9 (5 votes) · LW · GW

At this point, I basically agree that we agree and that the most useful follow up action is for someone (read: me) to actually be the change they want to see and write some (object-level), and ideally good, content from a more engineering-y bent.

As I mentioned in my reply to jimrandomh, a book review seems like a good place for me to start.

comment by Ruby · 2019-09-12T22:24:37.982Z · score: 2 (1 votes) · LW · GW

Cool. Looking forward to it!

comment by An1lam · 2019-03-31T23:31:25.586Z · score: 7 (4 votes) · LW · GW

I've recently been obsessing over the idea of trying to "make math more like programming". I'm not sure if it's just because I feel fluent at programming and still not very fluent at abstract math or also because programming really does have a feedback loop that you don't get in math.

Regardless, based on my reading it seems like there's a general consensus in math that even the most modern theorem provers, like Lean and Coq, are much less efficient than typical "informal" math reasoning. That said, I wonder if this ignores some of the benefits that programmers get from writing in a formal language, i.e. automatic refactoring tools, fast feedback loops, and code analysis/search tools. Also, it seems like a sufficiently user-friendly math theorem proving tool could be useful for education. If kids can learn how to program in Javascript, I have to believe they can learn to prove theorems, even if the learning curve's relatively steep.

Maybe once I play around with Lean more, I'll change my mind, but for now, I'm sticking to my contrarian viewpoint.

comment by Pattern · 2019-06-04T20:20:29.749Z · score: 4 (3 votes) · LW · GW

It seems like a useful idea on a lot of levels.

There's a difference between solving a problem where you're 1) trying to figure out what to do. 2) Executing an algorithm. 3) Evaluating a closed form solution (Plugging the values into the equation, performing the operations, and seeing what the number is.)***

Names. If you're writing a program, and you decide to give things (including functions/methods) names like the letters of the alphabet it's hard for other people to understand what you're doing. Including future you. As a math enthusiast I see the benefit of not having to generate names*, but teaching wise? I can see some benefits of merging/mixing. (What's sigma notation? It's a for loop.)

Functions. You can say f' is the derivative of f. Or you can get into the fact that there are functions** that take other functions as arguments. You can focus narrowly on functions of one-variable. Or you can notice that + is a function that takes two numbers (just like *, /, ^).

*Like when your idea of what you're doing /with something changes as you go and there's no refactoring tool on paper to change the names all at the last minute. (Though paper feels pretty nice to work with. That technology is really ergonomic.)

**And that the word function has more than one meaning. There's a bit of a difference between a way of calculating something and a lookup table.

***Also, seeing how things generalize can be easier with tools that can automatically check if the changes you've made have broken what you were making. (Writing tests.)

comment by An1lam · 2019-09-15T17:24:05.749Z · score: 5 (3 votes) · LW · GW

Epistemic status: Thinking out loud.

Introducing the Question

Scientific puzzle I notice I'm quite confused about: what's going on with the relationship between thinking and the brain's energy consumption?

On one hand, I'd always been told that thinking harder sadly doesn't burn more energy than normal activity. I believed that and had even come up with a plausible story about how evolution optimizes for genetic fitness not intelligence, and introspective access is pretty bad as it is, so it's not that surprising that we can't crank up our brains energy consumption to think harder. This seemed to jive with the notion that our brain's putting way more computational resources towards perceiving and responding to perception than abstract thinking. It also fit well with recent results calling ego depletion into question and into the framework in which mental energy depletion is the result of a neural opportunity cost calculation [LW · GW].

Going even further, studies like this one left me with the impression that experts tended to require less energy to accomplish the same mental tasks as novices. Again, this seemed plausible under the assumption that experts brains developed some sort of specialized modules over the thousands of hours of practice they'd put in.

I still believe that thinking harder doesn't use more energy, but I'm now much less certain about the reasons I'd previously given for this.

Chess Players' Energy Consumption

This recent ESPN (of all places) article about chess players' energy consumption during tournaments has me questioning this story. The two main points of the article are:

  1. Chess players burn a lot of energy during tournaments, potentially on the order of 6000 calories a day (that's about what marathon runners burn in a day). This results from intense mental stress leading to an elevated heart rate and, as a result, increased oxygen consumption. Chess players also tend to eat less during competitions, which also contributes to weight loss during tournaments (apparently Karpov once lost 20 pounds during an extended chess championship).
  2. Chess players and their coaches now understand that humans aren't Cartesian, i.e. our physical health impacts our cognitive performance, and have responded accordingly with intense physical training regimens. On the surface, none of this contradicts the claims I cited above. The article's claiming that chess players burn more energy purely from the side effects of stress, not because their brains are doing more work. So why am I revisiting this question?

Gaps in the Evolutionary Justification

First, reading the chess article led me to notice a big gap in the explanation I gave above for why we shouldn't expect a connection between thinking hard and energy consumption. In my explanation, I mentioned that we should expect our brains to spend much more energy on perceptive and reactive processing than on abstract thinking. This still makes sense to me as a general claim about the median mammal, but now seems less plausible to me as it relates to humans specifically. This recent study, for example, provides evidence that our (humans) big brains are one of two primary causes for our increased energy consumption compared to other primates. As far as I can tell, humans don't seem to have meaningfully better coordination or perceptive abilities than chimps. Chimps have opposable thumbs and big toes, spend their days picking bugs off of each other, and climbing trees. Given this, while I admittedly haven't looked into studies on this but I find it hard to imagine that human brains spend much more energy than chimps on perception.

Let's say that we put aside the question of what exactly human brains use their extra energy for and bucket it into the loose category of "higher mental functions". This still leaves me with a relevant question, why didn't brains evolve to use varying amounts of energy depending on what they were doing? In particular, if we assume that humans are the first and only mammals that spend large fractions of their calories on "extra" brain functions, then why wasn't there selection pressure to have those functions only use energy when they were needed instead of all the time?

Bringing things back to my original point, in my initial story, thinking didn't impact energy consumption because our brains spend most of their energy on other stuff anyway, so there wasn't strong selective pressure to connect thinking intensity to energy consumption. However, I've just given some evidence that "higher brain functions" actually did come with a significant energy cost, so we might expect that those functions' energy consumption would in fact be context-dependent.

Second, it's weird that what we're doing (mentally) can so dramatically impact our energy consumption due to elevated heart rate and other stress-triggered adaptations but has no impact on the energy our brain consumes. To be clear, it makes sense that physical activity and stress would be intimately connected as this connection is presumably very important for balancing the need to eat/escape predators with the need to not use too much energy when sitting around. One doesn't yet make sense to me is that, even though neurons evolved from the same cells as all the rest of our biology, they proved so resistant to optimization for variable energy consumption.

Rescuing the Original Hypothesis

The best explanation I can come up with for the two puzzles I just discussed is that, for whatever reason, evolution didn't select for a neural architecture that could selectively up- and down-regulate its energy consumption depending on the circumstances. For example, maybe the fact that neurons die when they don't have energy is somehow intimately coupled with their architecture such that there's no way to fix it short of something only a goal-directed consequentialist (and therefore not a hill-climbing process) could accomplish. If this is true, even though humans plausibly would've benefited at some point during our evolutionary history from being able to spend more or less energy on thinking, we shouldn't be surprised never happened.

Another weaker (IMO) explanation is that human brains do use more energy in certain situations for some "higher mental functions" but it's not the situations you'd expect. For example, maybe humans use a ton of energy for social cognition and if we could measure the neocortex's energy consumption during parties, we'd find it uses a lot more energy than usual.

comment by An1lam · 2019-09-22T20:07:22.944Z · score: 4 (3 votes) · LW · GW

ML-related math trick: I find it easier to imagine a 4D tensor, say of dimensions , as a big matrix with dimensions within which are nested matrices of dimensions . The nice thing about this is, at least for me, it makes it easier to imagine applying operations over the matrices in parallel, which is something I've had to thing about a number of times doing ML-related programming, e.g. trying to figure out how write the code to apply a 1D convolution-like operation to an entire batch in parallel.

comment by crabman · 2019-09-23T18:08:23.164Z · score: 1 (1 votes) · LW · GW

I've been studying tensor decompositions and approximate tensor formats for half a year. Since I've learned about tensor networks, I've noticed that I can draw them to figure out how to code some linear operations on tensors.

Once I used this to figure out how to implement backward method of some simple neural network layer (not something novel, it was for the sake of learning how deep learning frameworks work). Another time I needed to figure out how to implement forward method for a Conv2d layer with weights tensor in CP format. After drawing its output as a tensor network diagram, it was clear that I could just do a sequence of 3 Conv2d layers: pointwise, depthwise, pointwise.

I am not saying that you should learn tensor networks, it's probably a lot of buck for not too large bang unless you want to work with tensor decompositions and formats.

comment by An1lam · 2019-09-23T19:43:19.590Z · score: 1 (1 votes) · LW · GW

From cursory Googling, it looks like tensor networks are mostly used for understanding quantum systems. I'm not opposed to learning about them, but is there a good resource you can point me to that introduces them independent of the physics concepts? Were you learning them for use in physics?

For example, have you happened to read this Google AI paper introducing their TensorNetworks library and giving an overview?

comment by crabman · 2019-09-23T20:57:39.312Z · score: 1 (1 votes) · LW · GW

Unfortunately I don't know any quantum stuff. I learned them for machine learning purposes.

A monograph by Cichocki et al. (part 1, part 2) is an overview of how tensor decompositions, tensor formats, and tensor networks can be used in machine learning and signal processing. I think it lacks some applications, including acceleration and compression of neural networks by compression of weights of layers using tensor decompositions (this also sometimes improves accuracy, probably by reducing overfit).

Tensor decompositions and Applications by Kolda, Bader 2009 - this is an overview of tensor decompositions. It doesn't have many machine learning applications. Also it doesn't talk of tensor networks, only about some simplest tensor decompositions and specific tensor formats which are the most popular types of tensor networks. This paper was the first thing I read about all the tensor stuff, and it's one of the easier things to read. I recommend you read it first and then look at the topics that seem interesting to you in Cichocki et al.

Tensor spaces and numerical tensor calculus - Hackbusch 2012 - this textbook covers mathematics of tensor formats and tensor decompositions for hilbert and banach spaces. No applications, a lot of math, functions analysis is kinda a prerequisite. Very dense and difficult to read textbook. Also doesn't talk of tensor networks, only about specific tensor formats.


Handwaving and interpretive dance - This is simple, it's about tensor networks, not other tensor stuff. It's for physicists but chapter 1 and maybe other chapters can be read without physics background.


Regarding the TensorNetwork library. I've skim-read it. I haven't tried using it. I think it's in early alpha or something. How usable it is for me depends on how well it can interact with pytorch and how easy it is to do autodifferentiation w.r.t. core tensors and use the tensor network in a pytorch model. Intuitively the API seemed nice. I think their idea is to that you take a tensor, make it into a matrix, do truncated svd, now you have 2 matrices, turn them back to tensors. Now you do the same for them. This way you can perform some but not all popular tensor decomposition algorithms.

P.S. Fel free to message me if you have questions about tensor decomposition/network/formats stuff

comment by An1lam · 2019-09-27T03:19:38.809Z · score: 3 (3 votes) · LW · GW

Today I attended the first of two talks in a two-part mini-workshop on Variational Inference. It's interesting to think of from the perspective of my recent musings about more science-y vs. engineering mindsets because it highlighted the importance of engineering/algorithmic progress in widening Bayesian methods' applicability

The presenter, who's a fairly well known figure in probabilistic ML and has developed some well known statistical inference algorithms, talked about how part of the reason so much time was spent debating philosophical issues in the past was because Bayesian inference wasn't computationally tractable until the development of Gibbs Sampling in the '90s by Gelfand & Smith.

To be clear, the type of progress I'm talking about is still "scientific" in the sense of it mostly involves applied math and finding good ways to approximate posterior distributions. But, it's "engineering" in the sense that it's the messy sort of work I talked about in my other post, where messy means a lot of the methods don't have a good theoretical backing and involve making questionable (at least ex ante) statistical assumptions. Now, the counter is of course that we don't have a theoretical backing yet, but there still may be one in the future.

I'll probably have more to say about this when the workshop's over but I partly just wanted to record my thoughts while they were fresh.

comment by An1lam · 2019-10-14T14:50:40.298Z · score: 2 (2 votes) · LW · GW

Thing I desperately want: tablet native spaced repetition software that lets me draw flashcards. Cloze deletions are just boxes or hand-drawn occlusions.