My supervillain origin story

dmitry-vaintrob

My supervillain origin story

post by Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-27T12:20:46.101Z · LW · GW · 1 comments

1 comment

When I started graduate school (for math), I was very interested in big ideas. I had had a couple experiences of having general research intuitions pan out really well and felt like the core of good research is having a brave idea, a gestalt. I went into grad school looking for the “gestalt people”. The people whose math had that mysterious, cutting edge flavor but was not too pop (at the time the sexiest thing around was higher category theory and I was drawn to it and tried to learn it, but I didn’t want to do the “common” thing of working in that field). I ended up choosing an advisor, and skipped over any computational or “applications-driven” (insofar as doing calculations in Galois theory or string theory counts as applications) stuff that he recommended working on. It really wasn’t my thing: you had to read long technical papers, use the right lemmas, apply them to get actual numbers (yuck) that people would then build upon in future research. I wanted to have the big ideas.

I ended up coming up with my own research project to check that an object my advisor had discovered – a certain new category associated to a Lie group – is equivalent to another category that I defined by applying higher category theory to some combinatorial data. This was exactly the kind of general thing I wanted to have: no numbers, purely theoretical, applies to a fully general class of groups: “Gestalt”. My advisor was skeptical but let me work on it.

I ended up trying to prove a false result for a year and a half.

It wasn’t one of these “cute” false results, where all the cases you check work but there’s an unexpected edge case or counterexample. It was a straight-up false result where, if you really carefully worked out the smallest nontrivial example, you would see it’s false. The problem is that the falseness was still a little sneaky. The thing is that in this flavor of representation theory the smallest interesting case, the group GL2 (i.e. the group of matrices) is already somewhat hard (there is an infamous 300-page book about it), and while you wouldn’t have to read 300 pages to find the contradiction, you would have to intentionally work on it: clearly think through the implications of the result being correct with a view towards looking for a concrete computation or check that would lead to a contradiction. Instead of this, I alternated between trying to find the “correct” proof and thinking what nice related consequences I would get from it once it’s proven.

While this was going on, I was going to a representation theory seminar organized by Pavel Etingof (among others), a professor who is a friend and mentor of mine from high school. (Outside math he is a minor celebrity in mushroom picking circles:)

Pavel is a wonderful, warm person who has the best sense of humor of anyone I know. But he is a nightmare lecture attendee (I am known to be bad, but he is much worse). He asks a lot of questions. In my experience he once interrupted a graduate student’s talk after the “background context” stage to excitedly point out “oh, and notice that there is a nice obvious consequence of this” and proceeding to accidentally explain a stronger version of the person’s thesis in 5 minutes.

But one question he almost always asks, at any representation theory talk, is “can we check this for GL2?”. He will then derail the talk to go through, on the blackboard, a computation or derivation of what the big formal construction does in this minimal interesting case. (Often once this is done the rest of the talk is moot, since in many contexts GL2 is a “representative” case: once you understand all the nontrivialities at this level, they transpose directly to all other groups).

While at the time I was inspired by Pavel and on some level noticed the usefulness of working out concrete cases, I never thought of myself as a concrete person: I was a big idea guy. And I payed for this with my phd research.

At the end I got lucky. The “half” of the equivalence that I knew for sure (a “functor” which just happens to not in fact be an equivalence) was enough to prove a new result which I wrote up in my thesis. But the realization of months of research down the toilet led me on a villain’s journey of noticing the flaws in Gestalt reasoning. Is this very optimistic idea that you think will prove all of mirror symmetry really reasonable? How hard have you checked it? Did you look at GL2?

I am still at heart a big idea person. I love overconfident statements, I love thinking that “this one idea is all you need (plus a bunch of context around this idea)”. But I also love to nitpick and be skeptical. I love to notice when someone hasn’t actually gone through the work of really (i.e., with a view to disprove, not just perfunctorily) checking whether their idea applies in the minimal interesting case.

I think this has both good and bad consequences.

The good consequence is that I think I have finally (after over a decade) made progress in internalizing the idea of “checking this for GL2”. In my own research, I try to find a minimal operationalization that’s interesting (i.e., doesn’t follow from other simpler contexts) and nontrivial, where an idea I have might break.

The bad consequence is that I sometimes overdo this when thinking about other people’s research. I do think there is such a thing as “wrongly shaped” research. You won’t get very far if a core untested assumption you made is false, or if you’re trying to make some uber-formal philosophical picture of “what transformers really do” without actually ever looking at a single paper or experiment with real transformers. But there is also research that is “usefully wrong”. I notice cases where someone with a strong intuition of something real and interesting will try to explain something, and others will object that it makes a suspicious assumption that doesn’t stand up to scrutiny, or captures an interesting but oversimplified picture that doesn’t correspond to my understanding of “what is realistically useful”, or fails in this specific case. And I sometimes (in an intuitive sense that’s hard to exactly give examples of) notice the skeptics (usually, me) being wrong here, or even being right but interrupting an interesting chain of reasoning that could be self-correcting.

A recent and very obvious example of the latter was when Kaarel Hänni and I were discussing the results of the “leap complexity” paper. Kaarel was excited about this paper but I was skeptical: I felt like its assumptions are a bit too limiting. And then after a bit more discussion I jumped at a clear nitpick that I could use to clearly falsify the paper: some obvious equivariance properties imply that the general result they claim has counterexamples and can’t be correct. Since the authors of this paper are very legit, I thought at first I might be misunderstanding it; we worked through some examples with Kaarel and at the end he agreed with me that the counterexamples are real. We were halfway through writing the authors a politely phrased “your paper is wrong and garbage” email when, just in case, we tried looking through the paper one more time to see if maybe we’re misunderstanding something. We noticed that at the end of the paper the authors themselves go through the exact same counterexample argument, and explain that one of the assumptions they make early in the paper (which to be fair, is a bit sneakily hidden) exactly gets rid of counterexamples like this. The paper was (unsurprisingly) correct. Eventually I came to appreciate this paper’s depth and joined Kaarel as an enthusiastic believer in its “conceptual” message. Thinking through related ideas together led both of us to refine our research ideas in useful ways (in particular, my current ideas about “analogy circuits” owe a lot to these discussions). And there's a lesson in noticing that if I had followed my nose, I’d have dismissed this angle of inquiry and never engaged in these ideas (or in the shoes of its authors, I would not have even embarked on this research after noticing the counterexamples to the general case).

I don’t know what the moral to this story is. I do still hold strong to my supervillain inception moment; I believe in carefully working out simple cases and open-mindedly looking for counterexamples to the ideas you're excited about. I wouldn’t mind having the phrase “Check it for GL2” being written on my tombstone, and frankly, I’m still learning that lesson. But also I guess maybe the supervillain in me has embarked on a bit of a redemption arc. I’m starting to actively see the issues with excess skepticism, and I’m trying to learn to push through skepticism a bit more in thinking about my own research and others’. I’m still trying to balance these ideas out, correctly calibrate, in a research context, the struggle between faith and cynicism. If I were to try to extract a meaningful “take” here I would fail or say something banal. Thankfully I don’t have to, since it has been done before, on this site, really clearly and cogently. This is Elizabeth’s post on butterfly ideas. [LW · GW] Go read it now, while I go back and brood in my lair.

1 comments

Comments sorted by top scores.

comment by CapResearcher · 2025-02-21T15:11:04.974Z · LW(p) · GW(p)

I also studied the leap complexity paper and found what looks like a solid a counterexample. Now I'm wondering if it's the same one you found. Was your "counterexample" also that learning high degree monomials under Gaussian data has lower CSQ complexity than the leap complexity predicts?

For example, say we want to sense an order Hermite polynomial ${H e}_{2 k + 1}$ in an unknown direction $u$ . we may query functions of the form $f_{z} (x) = d^{- k / 2} z^{⊤} x (∥ x ∥^{2} - d)^{k}$ , for various unit directions $z$ . Now, $∥ f_{z} ∥ = O (1)$ , and $⟨ {H e}_{2 k + 1}, f_{z} ⟩ = Θ (d^{- k / 2} u^{⊤} z)$ . This relatively strong correlation lets us recover the direction $u$ in something like $d^{k + 1}$ queries, much less than $d^{2 k + 1}$ queries predicted by the leap complexity.

What’s the hidden assumption that prevents this from contradicting the bound?

My supervillain origin story

Contents

1 comments