What are concrete examples of potential "lock-in" in AI research?
post by Grue_Slinky
score: 17 (8 votes) ·
This is a question post.
I had some colleagues watch Ben Garfinkel's talk, "How sure are we about this AI stuff?", which among other things, pointed out that it's often difficult to change the long-term trajectory of some technology. For instance, electricity, the printing press, and agriculture were all transformative technologies, but even if we recognized their importance in advance, it's hard to see what we could really change about them in the long-term.
In general, when I look at technological development/adoption, I tend to see people following local economic incentives wherever they lead, and it often seems hard to change these gradients without some serious external pressures (forceful governments, cultural taboos, etc.). I don't see that many "parallel tracks" where a farsighted agent could've set things on a different track by pulling the right lever at the right time. A counterexample is the Qwerty vs. Dvorak keyboard, where someone with enough influence may well have been able to get society to adopt the better keyboard from a longtermist perspective.
This causes one to look at cases of "lock-in": times where we could have plausibly taken any one of multiple paths, and this decision:
a) could have been changed my a relatively small group of farsighted agents
b) had significant effects that lasted decades or more
A lot of the best historical examples of this aren't technological--the founding of major religions, the writing of the US constitution, the Bretton Woods agreement--which is maybe some small update towards political stuff being important from a longtermist perspective.
But nevertheless, there are examples of lock-in for technological development. In a group discussion after watching Garfinkel's talk, Lin Eadarmstadt asked what examples of lock-in there might be for AI research. I think this is a really good question, because it may be one decent way of locating things we can actually change in the longterm. (Of course, not the only way by any means, but perhaps a fruitful one).
After brainstorming this, it felt hard to come up with good examples, but here's two sort-of-examples:
First, there's the programming language that ML is done in. Right now, it's almost entirely Python. In some not-totally-implausible counterfactual, it's done in OCaml, where the type-checking is very strict, and hence certain software errors are less likely to happen. On this metric, Python is pretty much the least safe language for ML.
Of course, even if we agree the OCaml counterfactual is better in expectation, it's hard to see how anyone could've nudged ML towards it even in hindsight. Of course, this would've been much easier when ML was a smaller field than it is now, hence we can say Python's been "locked in". On the other hand, I've heard murmurs about Swift attempting to replace it, with the latter having better-than-zero type safety.
Caveats: I don't take these "murmurs" seriously, it seems very unlikely to me that AGI goes catastrophically wrong due to a lack of type safety, and I don't think it's worth the time of anyone here to worry about this. This is mostly just a hopefully illustrative example.
Currently, deep reinforcement learning (DRL) is usually done by specifying a reward function upfront, and having the agent figure out how to maximize it. As we know, reward functions are often hard to specify properly in complex domains, and this is one bottleneck on DRL capabilities research. Still, in my naive thinking, I can imagine a plausible scenario where DRL researchers get used to "fudging it": getting agents to sort-of-learn lots of things in a variety of relatively complex domains where the reward functions are hacked together by grad student descent, and after many years of hardware overhang have set in, someone finally figures out a way to stitch these together to get an AGI (or something "close enough" to do some serious damage).
The main alternatives to reward specification are imitation learning, inverse RL, and DeepMind's reward modeling (see section 7 of this paper for a useful comparison). In my estimation, either of these approaches are probably safer than the "AGI via reward specification" path.
Of course, these don't clearly form 4 distinct tech paths, and I rate it > 40% that if AGI largely comes out of DRL, no one technique will claim all the major milestones along the way. So this is a pretty weak example of "lock-in", because I think, for instance, DRL researchers will flock to reward modeling if DeepMind unambiguously demonstrates its superiority over reward specification.
Still, I think there is an extent to which researchers become "comfortable" with research techniques, and that if TensorFlow has extensive libraries for reward specification and every DRL textbook has a chapter "Heuristics for Fudging It", while other techniques are viewed as esoteric and have start-up costs to applying (and less libraries), then this may become a weak form of lock-in.
As I've said, those two are fairly weak examples. The former is a lock-in that happened a while ago that we probably can't change now, and it doesn't seem that important even if we could. The latter is a fairly weak form of lock-in, in that it can't withstand that much in the way of counter-incentives (compare with the Qwerty keyboard).
Still, I found it fun thinking about these, and I'm curious if people have any other ideas of potential "lock-in" for AI research? (Even if it doesn't have any obvious implications for safety).
answer by Wei_Dai
· score: 11 (6 votes) · LW
The spread of Tegmark Level IV, UDT, and related ideas may be an example of "lock-in" that has already happened (to varying degrees) within the rationalist, EA, and AI safety research communities, and could possibly happen to the wider AI research community. (It seems easy to imagine an alternate timeline in which these ideas never spread beyond a few obscure papers and blog posts, or do spread somewhat but are considered outlandish by most people.)
comment by Grue_Slinky
· score: 1 (1 votes) · LW
Huh, that's a good point. Whereas it seems probably inevitable that AI research would've eventually converged on something similar to the current D(R)L paradigm, we can imagine a lot of different ways AI safety could have looked like instead right now. Which makes sense, since the latter is still young and in a kind of pre-paradigmatic philosophical stage, with little unambiguous feedback to dictate how things should unfold (and it's far from clear when substantially more of this feedback will show up).
I can imagine an alternate timeline where the initial core ideas/impetus for AI safety didn't come from Yudkowsky/LW, but from e.g. a) Bostrom/FHI b) Stuart Russell or c) some near-term ML safety researchers whose thinking gradually evolved as they thought about longer and longer timescales. And it's interesting to ask what the current field would consequently look like:
- Agent Foundations/Embedded Agency probably (?) wouldn't be a thing, or at least it would might take some time for the underlying questions which motivate it to be asked in writing, let alone the actual questions within those agendas (or something close to them)
- For (c) primarily, its unclear if the alignment problem would've been zeroed in on as the "central challenge", or how long this would take (note: I don't actually know that much about near-term concerns, but I can imagine things like verification, adversarial examples, and algorithmic fairness lingering around on center stage for a while).
- A lot of the focus on utility functions probably wouldn't be there
And none of that is to say anything about those alternate timelines is better, but is to say that a lot of the things I often associate with AI safety are only contingently related. This is probably obvious to a lot of people on here, and of course we have seen some of the Yudkowskian foundational framings of the problem have been de-emphasized as non-LW people have joined the field.
On the other hand, as far as "lock-in" itself is concerned, it does seem like there's a certain amount of deference that EA has given MIRI/LW on some of the more abstruse matters where would-be critics don't want to sound stupid for lack of technical sophistication--UDT, Solomonoff, and similar stuff internal to agent foundations--and the longer any idea lingers around, and the farther it spreads, the harder it is to root out if we ever do find good reasons to overturn it. Although I'm not that worried about this, since those ideas are by definition only fully understood/debated by a small part of the community.
Also, it's my impression that most EAs believe in one-boxing, but not necessarily UDT. For instance, some apparently prefer EDT-like theories, which makes me think the relatively simple arguments for one-boxing have percolated pretty widely (and are probably locked in), but the more advanced details are still largely up for debate. I think similar things can be said for a lot of other things, e.g. "thinking probabilistically" is locked in but maybe not a lot of the more complicated aspects of Bayesian epistemology that have come out of LW.
answer by Wei_Dai
· score: 7 (3 votes) · LW
Another thing that I'd like to lock-in for AI research is the idea of AI design as opportunity and obligation to address human safety problems [LW · GW]. The alternative "lock-in" that I'd like to avoid is a culture where AI designers think it's someone else's job to prevent "misuse" of their AI, and never think about their users as potentially unsafe systems that can be accidentally or intentionally corrupted [LW · GW].
It may be hard to push this against local economic incentives, but if we can get people to at least pay lip service to it, then if an AGI project eventually gets enough slack against the economic incentives (e.g., it somehow gets a big lead over rival projects, or a government gets involved and throws resources at the project) then maybe it will put the idea into practice.
answer by FactorialCode
· score: 6 (3 votes) · LW
This isn't quite "lock in", but it's related in the sense that an outside force shaped the field of "deep learning".
I suspect the videogame industry, and the GPUs we're developed for it has locked in the type of technologies we now know as deep learning. GPU's were originally ASICs developed for playing videogames, so there are specific types of operations they were optimized to perform.
I suspect that neural network architectures that leveraged these hardware optimizations outperformed other neural networks. Conv nets and Transformers are probably evidence of this. The former leverages convolution, and the latter leverages matrix multiplication. In turn, GPUs and ASICs have been optimized to run these successful neural networks faster, with NVIDIA rolling out Tensor Cores and Google deploying their TPUs.
Looking back, it's hard to say that this combination of hardware and software isn't a local optima, and that if we were to redesign the whole stack from the bottom up, that the technologies with the capabilities of modern "deep learning" wouldn't look completely different.
It's not even clear how one could find another optimum in the space of algorithms+hardware at this point either. The current stack benefits both from open source contributions and massive economies of scale.
answer by PeterMcCluskey
· score: 1 (1 votes) · LW
a decentralized internet versus an internet under the central control of something like AOL.
Bitcoin energy usage.
electrical systems that provide plugs, voltages, and frequencies which are incompatible between countries.
Comments sorted by top scores.