Posts
Comments
Domain: Software engineering, mech interp
Bryce Meyer (primary maintainer of TransformerLens, and software engineer with many years of experience) has a weekly coding stream event where he does live coding on TransformerLens--resolving bugs, adding features and tests, etc. I've found it to be useful!
You can find it in the Open Source Mechanistic Interpretability Slack, under the "code-sessions" channel (feel free to DM for an invite).
Great post, but there is one part I'd like to push back on:
Iterators are also easier to identify, both by their resumes and demonstrated skills. If you compare two CVs of postdocs that have spent the same amount of time in academia, and one of them has substantially more papers (or GitHub commits) to their name than the other (controlling for quality), you’ve found the better Iterator. Similarly, if you compare two CodeSignal tests with the same score but different completion times, the one completed more quickly belongs to the stronger Iterator.
This seems like a bit of an over-claim. I would endorse a weaker claim, like "in the presence of a high volume of applicants, CodeSignal tests, GitHub commits, and paper count statistically provide some signal," but the reality of work in the fields of research and software development is often such that there isn't a clean correspondence between these measures and someone's performance. In addition, all three of these measures are quite easy to game (or Goodhart).
For example, in research alone, not every paper entails the same-sized project; two high-quality papers could have an order of magnitude difference in the amount of work required to produce them. Not every research bet pays off, too--some projects don't result in papers, and research management often plays a role in what directions get pursued (and dropped or not if they are unproductive). There are also many researchers who have made a career out of getting their names on as many papers as possible; there is an entire science to doing this that is completely independent of your actual research abilities.
In the case of CodeSignal evaluations, signal is likewise relatively low-dimensional and primarily conveys one thing: enough experience with a relatively small set of patterns that one can do the assessment very quickly. I've taken enough of these and seen enough reviews from senior engineers on CodeSignal tests to know that they capture only a small, specific part of what it takes to be a good engineer, and overemphasize speed (which is not the main thing you want from an actual senior engineer; you want quality as well as maintainability and readability, which often are at odds with speed. Senior engineers' first instinct is not generally to jump in and start spitting out lines of code like their lives depend on it). Then there's the issue of how hackable/gameable the assessments are; senior engineer Yanir Seroussi has a good blog post on CodeSignal specifically: https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/
I'm definitely not arguing that these metrics are useless, however. They do provide some signal (especially if the volume of applicants is high), but I'd suggest that we see them as imperfect proxies that we're forced to use due to insufficient manpower for comprehensive candidate evaluations, rather than actually capturing some kind of ground truth.
Perhaps more important than these details: How do you curate input to take notes on, and what is the purpose you take the notes for? How do you use the notes once written? (This latter point seems to be one of the biggest reason many people have dropped PKM systems.)
Very kind of you to say. :) I think for me, though, the source of the emotion I felt when reading this series was something like: "Ah, so in addition to ensuring we are dateable ourselves, we must fix society, capitalism (at least the dating part of it), culture, etc. in order to have a Good Dating Universe." Which in retrospect was a bit overblown of me, so I think I no longer endorse the strong version of what I said in that comment.
I think this list may successfully convince some to stay off the dating market indefinitely. Who in the world has time to work on all of this? At best, this is just a massive set of to-dos; at worst, it's an enormous list of all the ways the dating world sucks and reasons why you'll fail.
Upon reflection: This is a good collection of information, even if it is rather discouraging to read. May we all find exceptions to the unfortunate trends that seem to characterize the modern dating landscape.
I actually went through the same process as what you describe here, but it didn't remove my "transhumanist" label. I was a big fan of Humanity+, excited about human upgrading, etc. etc. I then became disillusioned about progress in the relevant fields, started to understand nonduality and the lack of a persistent or independent self, and realized AI was the only critical thing that actually was in the process of happening.
In that sense, my process was similar but I still consider myself a transhumanist. Why? Because for me, solving death or trying to make progress in the scientific fields that lead to various types of augmentations aren't the biggest or most critical pieces of transhumanism. One could probably have been a transhumanist in the 1800s, because for me it's about the process of imagining and defining and philosophizing about what humanity--on an individual organism level as well as on a sociocultural level--will become (or what it might be worthwhile to become) after particular types of technological transitions.
Admittedly, there is a normative component that's something like "those of us who want to should be able to become something more than base human" and isn't really active until those capabilities actually exist, but the process of thinking about what it might be worthwhile to become, or what the transition will be like, or what matters and what is valuable in this kind of future, are all important.
It's not about maximizing the self, either--I'm not an extropian. Whether or not something called "me" exists in this future (which might be soon but might not), the conscious experience of beings in it matter to me (and in this sense I'm a longtermist).
Will an aligned AI solve death? Maybe, but my hopes don't rely on this. Humanity will almost certainly change in diverse ways, and is already changing a bit (though often not in great ways). It's worthwhile to think about what kind of changes we would want to create, given greater powers to do so.
Yes, tuned lens is an excellent tool and generally superior to the original logit lens. In this particular case, I don't think it would show very different results, however (and in any case the logit lens is only a small part of the analysis), but I think it would be interesting to have some kind of integration with TransformerLens that enabled the training and usage of tuned lens as well.
This is a cool idea, and I have no doubt it helped somewhat, but IMO it falls prey to the same mistake I see made by the makers of almost every video series/online course/list of resources for ML math: assuming that math is mostly about concepts and facts.
It's only about 5% that. Maybe less. I and many others in ML have seen the same videos and remembered the concepts for a while too. And forgotten them, in time. More than once! On the other hand, I've seen how persistently and operationally fluent (especially in ML and interpretability) people become when they actually learned math the way it must be learned: via hundreds of hours of laborious exercises, proofs, derivations, etc. Videos and lectures are a small fraction of what's ultimately needed.
For most of ML, it's probably fine--you'll never need to do a proof or do more than simple linear algebra operations by hand. But if you want to do the really hard stuff, especially in interpretability, I don't think there's any substitute for cranking through those hours.
To be clear, I think this weekend was a great start on that--if you continue immediately to taking full courses and doing the exercises. I'm a top-down learner, so it would certainly help me. But unless it's practiced in very short order, it will be forgotten, and just become a collection of terms you recognize when others talk about them.