Posts
Comments
when i was new to research, i wouldn't feel motivated to run any experiment that wouldn't make it into the paper. surely it's much more efficient to only run the experiments that people want to see in the paper, right?
now that i'm more experienced, i mostly think of experiments as something i do to convince myself that a claim is correct. once i get to that point, actually getting the final figures for the paper is the easy part. the hard part is finding something unobvious but true. with this mental frame, it feels very reasonable to run 20 experiments for every experiment that makes it into the paper.
libraries abstract away the low level implementation details; you tell them what you want to get done and they make sure it happens. frameworks are the other way around. they abstract away the high level details; as long as you implement the low level details you're responsible for, you can assume the entire system works as intended.
a similar divide exists in human organizations and with managing up vs down. with managing up, you abstract away the details of your work and promise to solve some specific problem. with managing down, you abstract away the mission and promise that if a specific problem is solved, it will make progress towards the mission.
(of course, it's always best when everyone has state on everything. this is one reason why small teams are great. but if you have dozens of people, there is no way for everyone to have all the state, and so you have to do a lot of abstracting.)
when either abstraction leaks, it causes organizational problems -- micromanagement, or loss of trust in leadership.
the laws of physics are quite compact. and presumably most of the complexity in a zygote is in the dna.
a thriving culture is a mark of a healthy and intellectually productive community / information ecosystem. it's really hard to fake this. when people try, it usually comes off weird. for example, when people try to forcibly create internal company culture, it often comes off as very cringe.
don't worry too much about doing things right the first time. if the results are very promising, the cost of having to redo it won't hurt nearly as much as you think it will. but if you put it off because you don't know exactly how to do it right, then you might never get around to it.
the tweet is making fun of people who are too eager to do something EMPIRICAL and SCIENTIFIC and ignore the pesky little detail that their empirical thing actually measures something subtly but importantly different from what they actually care about
i've changed my mind and been convinced that it's kind of a big deal that frontiermath was framed as something that nobody would have access to for hillclimbing when in fact openai would have access and other labs wouldn't. the undisclosed funding before o3 launch still seems relatively minor though
lol i was the one who taped it to the wall. it's one of my favorite tweets of all time
this doesn't seem like a huge deal
in retrospect, we know from chinchilla that gpt3 allocated its compute too much to parameters as opposed to training tokens. so it's not surprising that models since then are smaller. model size is a less fundamental measure of model cost than pretraining compute. from here on i'm going to assume that whenever you say size you meant to say compute.
obviously it is possible to train better models using the same amount of compute. one way to see this is that it is definitely possible to train worse models with the same compute, and it is implausible that the current model production methodology is the optimal one.
it is unknown how much compute the latest models were trained with, and therefore what compute efficiency win they obtain over gpt4. it is unknown how much more effective compute gpt4 used than gpt3. we can't really make strong assumptions using public information about what kinds of compute efficiency improvements have been discovered by various labs at different points in time. therefore, we can't really make any strong conclusions about whether the current models are not that much better than gpt4 because of (a) a shortage of compute, (b) a shortage of compute efficiency improvements, or (c) a diminishing return of capability wrt effective compute.
suppose I believe the second coming involves the Lord giving a speech on capitol hill. one thing I might care about is how long until that happens. the fact that lots of people disagree about when the second coming is doesn't mean the Lord will give His speech soon.
similarly, the thing that I define as AGI involves AIs building Dyson spheres. the fact that other people disagree about when AGI is doesn't mean I should expect Dyson spheres soon.
people disagree heavily on what the second coming will look like. this, of course, means that the second coming must be upon us
I agree that labs have more compute and more top researchers, and these both speed up research a lot. I disagree that the quality of responses is the same as outside labs, if only because there is lots of knowledge inside labs that's not available elsewhere. I think these positive factors are mostly orthogonal to the quality of software infrastructure.
some random takes:
- you didn't say this, but when I saw the infrastructure point I was reminded that some people seem to have a notion that any ML experiment you can do outside a lab, you will be able to do more efficiently inside a lab because of some magical experimentation infrastructure or something. I think unless you're spending 50% of your time installing cuda or something, this basically is just not a thing. lab infrastructure lets you run bigger experiments than you could otherwise, but it costs a few sanity points compared to the small experiment. oftentimes, the most productive way to work inside a lab is to avoid existing software infra as much as possible.
- I think safetywashing is a problem but from the perspective of an xrisky researcher it's not a big deal because for the audiences that matter, there are safetywashing things that are just way cheaper per unit of goodwill than xrisk alignment work - xrisk is kind of weird and unrelatable to anyone who doesn't already take it super seriously. I think people who work on non xrisk safety or distribution of benefits stuff should be more worried about this.
- this is totally n=1 and in fact I think my experience here is quite unrepresentative of the average lab experience, but I've had a shocking amount of research freedom. I'm deeply grateful for this - it has turned out to be incredibly positive for my research productivity (e.g the SAE scaling paper would not have happened otherwise).
I think this is probably true of you and people around you but also you likely live in a bubble. To be clear, I'm not saying why people reading this should travel, but rather what a lot of travel is like, descriptively.
theory: a large fraction of travel is because of mimetic desire (seeing other people travel and feeling fomo / keeping up with the joneses), signalling purposes (posting on IG, demonstrating socioeconomic status), or mental compartmentalization of leisure time (similar to how it's really bad for your office and bedroom to be the same room).
this explains why in every tourist destination there are a whole bunch of very popular tourist traps that are in no way actually unique/comparatively-advantaged to the particular destination. for example: shopping, amusement parks, certain kinds of museums.
ok good that we agree interp might plausibly be on track. I don't really care to argue about whether it should count as prosaic alignment or not. I'd further claim that the following (not exhaustive) are also plausibly good (I'll sketch each out for the avoidance of doubt because sometimes people use these words subtly differently):
- model organisms - trying to probe the minimal sets of assumptions to get various hypothesized spicy alignment failures seems good. what is the least spoonfed demonstration of deceptive alignment we can get that is analogous mechanistically to the real deal? to what extent can we observe early signs of the prerequisites in current models? which parts of the deceptive alignment arguments are most load bearing?
- science of generalization - in practice, why do NNs sometimes generalize and sometimes not? why do some models generalize better than others? In what ways are humans better or worse than NNs at generalizing? can we understand this more deeply without needing mechanistic understanding? (all closely related to ELK)
- goodhart robustness - can you make reward models which are calibrated even under adversarial attack, so that when you optimize them really hard, you at least never catastrophically goodhart them?
- scalable oversight (using humans, and possibly giving them a leg up with e.g secret communication channels between them, and rotating different humans when we need to simulate amnesia) - can we patch all of the problems with e.g debate? can we extract higher quality work out of real life misaligned expert humans for practical purposes (even if it's maybe a bit cost uncompetitive)?
in capabilities, the most memetically successful things were for a long time not the things that actually worked. for a long time, people would turn their noses at the idea of simply scaling up models because it wasn't novel. the papers which are in retrospect the most important did not get that much attention at the time (e.g gpt2 was very unpopular among many academics; the Kaplan scaling laws paper was almost completely unnoticed when it came out; even the gpt3 paper went under the radar when it first came out.)
one example of a thing within prosaic alignment that i feel has the possibility of generalizability is interpretability. again, if we take the generalizability criteria and map it onto the capabilities analogy, it would be something like scalability - is this a first step towards something that can actually do truly general reasoning, or is it just a hack that will no longer be relevant once we discover the truly general algorithm that subsumes the hacks? if it is on the path, can we actually shovel enough compute into it (or its successor algorithms) to get to agi in practice, or do we just need way more compute than is practical? and i think at the time of gpt2 these were completely unsettled research questions! it was actually genuinely unclear whether writing articles about ovid's unicorn is a genuine first step towards agi, or just some random amusement that will fade into irrelevancy. i think interp is in a similar position where it could work out really well and eventually become the thing that works, or it could just be a dead end.
some concrete examples
- "agi happens almost certainly within in the next few decades" -> maybe ai progress just kind of plateaus for a few decades, it turns out that gpqa/codeforces etc are like chess in that we only think they're hard because humans who can do them are smart but they aren't agi-complete, ai gets used in a bunch of places in the economy but it's more like smartphones or something. in this world i should be taking normie life advice a lot more seriously.
- "agi doesn't happen in the next 2 years" -> maybe actually scaling current techniques is all you need. gpqa/codeforces actually do just measure intelligence. within like half a year, ML researchers start being way more productive because lots of their job is automated. if i use current/near-future ai agents for my research, i will actually just be more productive.
- "alignment is hard" -> maybe basic techniques is all you need, because natural abstractions is true, or maybe the red car / blue car argument for why useful models are also competent at bad things is just wrong because generalization can be made to suck. maybe all the capabilities people are just right and it's not reckless to be building agi so fast
i think it's quite valuable to go through your key beliefs and work through what the implications would be if they were false. this has several benefits:
- picturing a possible world where your key belief is wrong makes it feel more tangible and so you become more emotionally prepared to accept it.
- if you ever do find out that the belief is wrong, you don't flinch away as strongly because it doesn't feel like you will be completely epistemically lost the moment you remove the Key Belief
- you will have more productive conversations with people who disagree with you on the Key Belief
- you might discover strategies that are robustly good whether or not the Key Belief is true
- you will become better at designing experiments to test whether the Key Belief is true
there are two different modes of learning i've noticed.
- top down: first you learn to use something very complex and abstract. over time, you run into weird cases where things don't behave how you'd expect, or you feel like you're not able to apply the abstraction to new situations as well as you'd like. so you crack open the box and look at the innards and see a bunch of gears and smaller simpler boxes, and it suddenly becomes clear to you why some of those weird behaviors happened - clearly it was box X interacting with gear Y! satisfied, you use your newfound knowledge to build something even more impressive than you could before. eventually, the cycle repeats, and you crack open the smaller boxes to find even smaller boxes, etc.
- bottom up: you learn about the 7 Fundamental Atoms of Thingism. you construct the simplest non-atomic thing, and then the second simplest non atomic thing. after many painstaking steps of work, you finally construct something that might be useful. then you repeat the process anew for every other thing you might ever find useful. and then you actually use those things to do something
generally, i'm a big fan of top down learning, because everything you do comes with a source of motivation for why you want to do the thing; bottom up learning often doesn't give you enough motivation to care about the atoms. but also, bottom up learning gives you a much more complete understanding.
there is always too much information to pay attention to. without an inexpensive way to filter, the field would grind to a complete halt. style is probably a worse thing to select on than even academia cred, just because it's easier to fake.
I'm sympathetic to most prosaic alignment work being basically streetlighting. However, I think there's a nirvana fallacy going on when you claim that the entire field has gone astray. It's easiest to illustrate what I mean with an analogy to capabilities.
In capabilities land, there were a bunch of old school NLP/CV people who insisted that there's some kind of true essence of language or whatever that these newfangled neural network things weren't tackling. The neural networks are just learning syntax, but not semantics, or they're ungrounded, or they don't have a world model, or they're not representing some linguistic thing, so therefore we haven't actually made any progress on true intelligence or understanding etc etc. Clearly NNs are just progress on the surface appearance of intelligence while actually just being shallow pattern matching, so any work on scaling NNs is actually not progress on intelligence at all. I think this position has become more untenable over time. A lot of people held onto this view deep into the GPT era but now even the skeptics have to begrudgingly admit that NNs are pretty big progress even if additional Special Sauce is needed, and that the other research approaches towards general intelligence more directly haven't done better.
It's instructive to think about why this was a reasonable thing for people to have believed, and why it turned out to be wrong. It is in fact true that NNs are kind of shallow pattern matchy even today, and that literally just training bigger and bigger NNs eventually runs into problems. Early NNs - heck, even very recent NNs - often have trouble with relatively basic reasoning that humans have no problem with. But the mistake is assuming that this means no progress has been made on "real" intelligence just because no NN so far has perfectly replicated all of human intelligence. Oftentimes, progress towards the hard problem does actually not immediately look like tackling the meat of the hard problem directly.
Of course, there is also a lot of capabilities work that is actually just completely useless for AGI. Almost all of it, in fact. Walk down the aisle at neurips and a minimum of 90% of the papers will fall in this category. A lot of it is streetlighting capabilities in just the way you describe, and does in fact end up completely unimpactful. Maybe this is because all the good capabilities work happens in labs nowadays, but this is true even at earlier neuripses back when all the capabilities work got published. Clearly, a field can be simultaneously mostly garbage and also still make alarmingly fast progress.
I think this is true for basically everything - most work will be crap (often predictably so ex ante), due in part to bad incentives, and then there will be a few people who still do good work anyways. This doesn't mean that any pile of crap must have some good work in there, but it does mean that you can't rule out the existence of good work solely by pointing at the crap and the incentives for crap. I do also happen to believe that there is good work in prosaic alignment, but that goes under the object level argument umbrella, so I won't hash it out here.
sure, the thing you're looking for is the status system that jointly optimizes for alignedness with what you care about, and how legible it is to the people you are trying to convince.
a lot of unconventional people choose intentionally to ignore normie-legible status systems. this can take the form of either expert consensus or some form of feedback from reality that is widely accepted. for example, many researchers especially around these parts just don't publish at all in normal ML conferences at all, opting instead to depart into their own status systems. or they don't care whether their techniques can be used to make very successful products, or make surprisingly accurate predictions etc. instead, they substitute some alternative status system, like approval of a specific subcommunity.
there's a grain of truth to this, which is that the normal status system is often messed up (academia has terrible terrible incentives). it is true that many people overoptimize the normal status system really hard and end up not producing very much value.
but the problem with starting your own status system (or choosing to compete in a less well-agreed-upon one) is that it's unclear to other people how much stock to put in your status points. it's too easy to create new status systems. the existing ones might be deeply flawed, but at least their difficulty is a known quantity.
one common retort is that it's not worth proving yourself to people who are too closed minded and only accept ideas if they are validated by some legible status system. this is true to some extent, and i'm generally against people spending too much effort to optimize normie status too hard (e.g i think people should be way less worried about getting a degree in order to be taken seriously / get a job offer), but it's possible to take too far.
a rational decision maker should in fact discount claims of extremely illegible quality, because there are simply too many of them and it's too hard to pick out the good ones even if they were there (that's sort of the whole thing about illegibillity!). it seems bad to only bestow the truth upon people who happen to be irrational in ways that cause them to take you seriously by chance. if left unchecked, this kind of thing can also very easily evolve into a cult, where the unmooring from reality checks allows huge epistemic distortions.
a good in between approach might be to do some very legibly impressive things, just to prove that you can in fact do well at the legible status system if you chose to, and are intentionally choosing not to (as opposed to choosing alternative status systems because you're not capable of getting status in the legible system).
simple ideas often require tremendous amounts of effort to make work.
twitter is great because it boils down saying funny things to purely a problem of optimizing for funniness, and letting twitter handle the logistics of discovery and distribution. being e.g a comedian is a lot more work.
corollary: oftentimes, when smart people say things that are clearly wrong, what's really going on is they're saying the closest thing in their frame that captures the grain of truth
the world is too big and confusing, so to get anything done (and to stay sane) you have to adopt a frame. each frame abstracts away a ton about the world, out of necessity. every frame is wrong, but some are useful. a frame comes with a set of beliefs about the world and a mechanism for updating those beliefs.
some frames contain within them the ability to become more correct without needing to discard the frame entirely; they are calibrated about and admit what they don't know. they change gradually as we learn more. other frames work empirically but are a dead end epistemologically because they aren't willing to admit some of their false claims. for example, many woo frames capture a grain of truth that works empirically, but come with a flawed epistemology that prevents them from generating novel and true insights.
often it is better to be confined inside a well trodden frame than to be fully unconstrained. the space of all possible actions is huge, and many of them are terrible. on the other hand, staying inside well trodden frames forever substantially limits the possibility of doing something extremely novel
it's (sometimes) also a mechanism for seeking domains with long positive tail outcomes, rather than low variance domains
the financial industry is a machine that lets you transmute a dollar into a reliable stream of ~4 cents a year ~forever (or vice versa). also, it gives you a risk knob you can turn that increases the expected value of the stream, but also the variance (or vice versa; you can take your risky stream and pay the financial industry to convert it into a reliable stream or lump sum)
I think the most important part of paying for goods and services is often not the raw time saved, but the cognitive overhead avoided. for instance, I'd pay much more to avoid having to spend 15 minutes understanding something complicated (assuming there is no learning value) than 15 minutes waiting. so it's plausibly more costly to have to figure out the timetable, fare system, remembering to transfer, navigating the station, than the additional time spent in transit (especially applicable in a new unfamiliar city)
agree it goes in both directions. time when you hold critical context is worth more than time when you don't. it's probably at least sometimes a good strategy to alternate between working much more than sustainable and then recovering.
my main point is this is a very different style of reasoning than what people usually do when they talk about how much their time is worth.
people around these parts often take their salary and divide it by their working hours to figure out how much to value their time. but I think this actually doesn't make that much sense (at least for research work), and often leads to bad decision making.
time is extremely non fungible; some time is a lot more valuable than other time. further, the relation of amount of time worked to amount earned/value produced is extremely nonlinear (sharp diminishing returns). a lot of value is produced in short flashes of insight that you can't just get more of by spending more time trying to get insight (but rather require other inputs like life experience/good conversations/mentorship/happiness). resting or having fun can help improve your mental health, which is especially important for positive tail outcomes.
given that the assumptions of fungibility and linearity are extremely violated, I think it makes about as much sense as dividing salary by number of keystrokes or number of slack messages.
concretely, one might forgo doing something fun because it seems like the opportunity cost is very high, but actually diminishing returns means one more hour on the margin is much less valuable than the average implies, and having fun improves productivity in ways not accounted for when just considering the intrinsic value one places on fun.
I'd be surprised if this were the case. next neurips I can survey some non native English speakers to see how many ML terms they know in English vs in their native language. I'm confident in my ability to administer this experiment on Chinese, French, and German speakers, which won't be an unbiased sample of non-native speakers, but hopefully still provides some signal.
only 2 people walked away without answering (after saying yes initially); they were not counted as yes or no. another several people refused to even answer, but this was also quite rare. the no responders seemed genuinely confused, as opposed to dismissive.
feel free to replicate this experiment at ICML or ICLR or next neurips.
not sure, i didn't keep track of this info. an important data point is that because essentially all ML literature is in english, non-anglophones generally either use english for all technical things, or at least codeswitch english terms into their native language. for example, i'd bet almost all chinese ML researchers would be familiar with the term CNN and it would be comparatively rare for people to say 卷积神经网络. (some more common terms like 神经网络 or 模型 are used instead of their english counterparts - neural network / model - but i'd be shocked if people didn't know the english translations)
overall i'd be extremely surprised if there were a lot of people who knew conceptually the idea of AGI but didn't know that it was called AGI in english
the specific thing i said to people was something like:
excuse me, can i ask you a question to help settle a bet? do you know what AGI stands for? [if they say yes] what does it stand for? [...] cool thanks for your time
i was careful not to say "what does AGI mean".
most people who didn't know just said "no" and didn't try to guess. a few said something like "artificial generative intelligence". one said "amazon general intelligence" (??). the people who answered incorrectly were obviously guessing / didn't seem very confident in the answer.
if they seemed confused by the question, i would often repeat and say something like "the acronym AGI" or something.
several people said yes but then started walking away the moment i asked what it stood for. this was kind of confusing and i didn't count those people.
I decided to conduct an experiment at neurips this year: I randomly surveyed people walking around in the conference hall to ask whether they had heard of AGI
I found that out of 38 respondents, only 24 could tell me what AGI stands for (63%)
we live in a bubble
I'm very excited about approaches to add hierarchy to SAEs - seems like an important step forward. In general, approaches that constraint latents in various ways that let us have higher L0 without reconstruction becoming trivial seem exciting.
I think it would be cool to get follow up work on bigger LMs. It should also be possible to do matryoshka with block size = 1 efficiently with some kernel tricks, which would be cool.
I won't claim to be immune to peer pressure but at least on the epistemic front I think I have a pretty legible track record of believing things that are not very popular in the environments I've been in.
a medium with less limitations is strictly better for making good art, but it's also harder to identify good art among the sea of bad art because the medium alone is no longer as good a signal of quality
to be clear, a "winter/slowdown" in my typology is more about the vibes and could only be a few years counterfactual slowdown. like the dot-com crash didn't take that long for companies like Amazon or Google to recover from, but it was still a huge vibe shift
also to further clarify this is not an update I've made recently, I'm just making this post now as a regular reminder of my beliefs because it seems good to have had records of this kind of thing (though everyone who has heard me ramble about this irl can confirm I've believed sometime like this for a while now)
people often say that limitations of an artistic medium breed creativity. part of this could be the fact that when it is costly to do things, the only things done will be higher effort
also a lot of people will suggest that alignment people are discredited because they all believed AGI was 3 years away, because surely that's the only possible thing an alignment person could have believed. I plan on pointing to this and other statements similar in vibe that I've made over the past year or two as direct counter evidence against that
(I do think a lot of people will rightly lose credibility for having very short timelines, but I think this includes a big mix of capabilities and alignment people, and I think they will probably lose more credibility than is justified because the rest of the world will overupdate on the winter)
it's quite plausible (40% if I had to make up a number, but I stress this is completely made up) that someday there will be an AI winter or other slowdown, and the general vibe will snap from "AGI in 3 years" to "AGI in 50 years". when this happens it will become deeply unfashionable to continue believing that AGI is probably happening soonish (10-15 years), in the same way that suggesting that there might be a winter/slowdown is unfashionable today. however, I believe in these timelines roughly because I expect the road to AGI to involve both fast periods and slow bumpy periods. so unless there is some super surprising new evidence, I will probably only update moderately on timelines if/when this winter happens
a take I've expressed a bunch irl but haven't written up yet: feature sparsity might be fundamentally the wrong thing for disentangling superposition; circuit sparsity might be more correct to optimize for. in particular, circuit sparsity doesn't have problems with feature splitting/absorption
the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.
- at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
- when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
- at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups
related: https://xkcd.com/217/