How do we prepare for final crunch time?
post by Eli Tyre (elityre) · 2021-03-30T05:47:54.654Z · LW · GW · 16 commentsThis is a question post.
Contents
Access A different kind of work Knowledge Research methodology / Scientific “rationality” Productivity Practice no-cost-too-large productive periods Optimize rest Picking up new tools Staying grounded and stable in spite of the stakes None Answers 38 Daniel Kokotajlo 22 TsviBT 8 Chris_Leong 8 Vanilla_cabs 2 Gurkenglas 2 Gurkenglas 1 fT3g0 None 16 comments
[Crossposted from Musings and Rough Drafts.]
Epistemic status: Brainstorming and first draft thoughts.
Inspired by something that Ruby Bloom wrote and the Paul Christiano episode of the 80,000 hours podcast.]
One claim I sometimes hear about AI alignment [paraphrase]:
It is really hard to know what sorts of AI alignment work are good this far out from transformative AI. As we get closer, we’ll have a clearer sense of what AGI / Transformative AI is likely to actually look like, and we’ll have much better traction on what kind of alignment work to do. In fact, MOST of the work of AI alignment is done in the final few years (or months) before AGI, when we’ve solved most of the hard capabilities problems already so we know what AGI will look like and we can work directly, with good feedback loops, on the sorts of systems that we want to align.
Usually, this is said to argue that to value of the alignment research being done today is primarily that of enabling, future, more critical, alignment work. But “progress in the field” is only one dimension to consider in boosting and unblocking the work of alignment researchers in this last stretch.
In this post I want to take the above posit seriously, and consider the implications. If most of the alignment work that will be done is going to be done in the final few years before the deadline, our job in 2021 is mostly to do everything that we can to enable the people working on the problem in the crucial period (which might be us, or our successors, or both) so that they are as well equipped as we can possibly make them.
What are all the ways that we can think of that we can prepare now, for our eventual final exam? What should we be investing in, to improve our efficacy in those final, crucial, years?
The following are some ideas.
[In this post, I'm going to refer to this last stretch of a few months to a few years, "final crunch time", as distinct from just "crunch time", ie this century.]
Access
For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.
A different kind of work
Most current AI alignment work is pretty abstract and theoretical, for two reasons.
The first reason is a philosophical / methodological claim: There’s a fundamental “nearest unblocked strategy” / overfitting problem. Patches that correct clear and obvious alignment failures are unlikely to generalize fully, they'll only constrain unaligned optimization to channels that you can’t recognize. For this reason, some claim, we need to have an extremely robust, theoretical understanding of intelligence and alignment, ideally at the level of proofs.
The second reason is a practical consideration: we just don’t have powerful AI systems to work with, so there isn’t much that can be done in the way of tinkering and getting feedback.
That second objection becomes less relevant in final crunch time: in this scenario, we’ll have powerful systems 1) that will be built along the same lines as the systems that it is crucial to align and 2) that will have enough intellectual capability to pose at least semi-realistic “creative” alignment failures (ie, current systems are so dumb, and live in such constrained environments, that it isn’t clear how much we can learn about aligning literal superintelligences from them.)
And even if the first objection ultimately holds, theoretical understanding often (usually?) follows from practical engineering proficiency. It seems like it might be a fruitful path to tinker with semi-powerful systems: trying out different alignment approaches empirically, and tinkering to discover new approaches, and then backing up to do robust theory-building given much richer data about what seems to work.
I could imagine sophisticated setups that enable this kind of tinkering and theory building. For instance, I imagine a setup that includes:
- A “sandbox” that afford easy implementation of many different AI architectures and custom combinations of architectures, with a wide variety easy-to-create, easy-to-adjust, training schemes, and a full suite of interpretability tools. We could quickly try out different safety schemes, in different distributions, and observe what kinds of cognition and behavior result.
- A meta AI that observes the sandbox, and all of the experiments therein, to learn general principles of alignment. We could use interpretability tools to use this AI as a “microscope [LW · GW]” on the AI alignment problem itself, abstracting out patterns and dynamics that we couldn’t easily have teased out with only our own brains. This meta system might also play some role in designing the experiments to run in the sandbox, to allow it to get the best data to test it’s hypotheses.
- A theorem prover that would formalize the properties and implications of those general alignment principles, to give us crisply specified alignment criteria by which we can evaluate AI designs.
Obviously, working with a full system like this is quite different than abstract, purely theoretical work on decision theory or logical uncertainty. It is closer to the sort of experiments that the OpenAI and Deep Mind safety teams have published, but even that is a pretty far cry from the kind of rapid-feedback tinkering that I’m pointing at here.
Given that the kind of work that leads to research progress might be very different in final crunch time than it is now, it seems worth trying to forecast what shape that work will take and trying to see if there are ways to practice doing that kind of work before final crunch time.
Knowledge
Obviously, when we get to final crunch time, we don’t want to have to spend any time studying fields that we could have studied in the lead-up years. We want to have already learned all the information and ways of thinking that we’ll want to know, then. It seems worth considering what fields we’ll wish we had known when time comes.
The obvious contenders:
- Machine Learning
- Machine Learning interpretability
- All the Math of Intelligence that humanity has yet amassed [Probability theory Causality, etc.]
Some less obvious possibilities:
- Neuroscience?
- Geopolitics, if it turns out that which technical approach is ideal hinges on important facts about the balance of power?
- Computer security?
- Mechanism design in general?
Do other subjects come to mind?
Research methodology / Scientific “rationality”
We want the research teams tackling this problem in final crunch time to have the best scientific methodology and the best cognitive tools / habits for making research progress, that we can manage to provide them.
This maybe includes skills or methods in the domains of:
- Ways to notice as early as possible if you’re following an ultimately-fruitless research path
- Noticing / Resolving /Avoiding blindspots
- Effective research teams
- Original seeing / overcoming theory blindness / hypothesis generation
- ???
Productivity
One obvious thing is to spend time now, investing in habits and strategies for effective productivity. It seems senseless to waste precious hours in the final crunch time due to procrastination or poor sleep. It is well worth in to solve those problems now. But aside from the general suggestion to get your shit in order and develop good habits, I can think of two more specific things that seem good to do.
Practice no-cost-too-large productive periods
There maybe trades that could make people more productive on the margin, but are too expensive in regular life. For instance, I think that I might conceivably benefit from having a dedicated person who’s job is to always be near me, so that I can duck with them (or have them "hold space" for me) with 0 friction. I’ve experimented a little bit with similar ideas (like having a list of people on call to duck with), but it doesn’t seem worth it for me to pay a whole extra person-salary to have the person be on call and in the same building, instead of on-call via zoom.
But it is worth it at final crunch time.
It might be worth it to spend some period of time, maybe a week, maybe a month, every year, optimizing unrestrainedly for research productivity, with no heed to cost at all, so that we can practice how to do that. This is possibly a good thing to do anyway, because it might uncover trades that, on reflection, are worth importing into my regular life.
Optimize rest
One particular subset of personal productivity, that jumps out at me: each person should figure out their actual optimal cadence of rest.
There’s a failure mode that ambitious people commonly fall into, which is working past the point when marginal hours of work are negative. When the whole cosmic endowment is on the line, there will be a natural temptation to push yourself to work as hard as you can, and forgo rest. Obviously, this is a mistake. Rest isn’t just a luxury: it is one of the inputs to productive work.
There is a second level of this error in which one, grudgingly, takes the minimal amount of rest time, and gets back to work. But the amount of rest time required to stay functional is not the optimal amount of rest, the amount the maximizes productive output. Eliezer mused [LW(p) · GW(p)] years ago, that he felt kind of guilty about it, but maybe he should actually take two days off between research days, because the quality of his research seemed better on days when he happened to have had two rest days preceding.
In final crunch time, we want everyone to be resting the optimal amount that actually maximizes area under the curve, not the one that maximizes work-hours. We should do binary search now, to figure out what the optimum is.
Also, obviously, we should explore to discover highly effective methods of rest, instead of doing whatever random things seem good (unless, as it turns out, “whatever random thing seems good” is actually the best way to rest).
Picking up new tools
One thing that will be happening in this time, is there will be a flurry of new AI tools that can radically transform thinking and research, perhaps increasingly radical tools coming at a rate of once a month or faster.
Being able to take advantage of those tools and start using them for research immediately, with minimal learning curve, seems extremely high leverage.
If there are things that we can do that increase the ease of picking up new tools and using them to their full potential (instead of, as is common, using only the features afforded by your old tools and only very gradually
Some thoughts (probably bad):
- Could we set up our workflows, somehow, such that it is easy to integrate new tools into them? Like if you already have a flexible, expressive research interface (something like Roam? or maybe, if the technology is more advanced by then, something like neurolink?), and you’re used to regular changes in capability to the backed of the interface?
- Can we just practice? Can we have a competitive game of introducing new tools, and trying to orient to them and figure out how to exploit them creatively as possible?
- Probably it should be some people’s full time job to translate cutting edge developments in AI into useful tools and practical workflows, and then to teach those workflows to the researchers?
- Can we design a meta-tool that helps us figure out how to exploit new tools? Is it possible to train an AI assistant specifically for helping us get the most out of our new AI tools?
- Can we map out the sort of constraints on human thinking and/or the the sorts of tools that will be possible, in advance, so that we can practice with much weaker versions of those tools, and get a sense of how we would use them, so that we’re ready when they arrive?
- Can we try out new tools on psychedelics, to boost neuroplasticity? Is there some other way to temporarily weaken our neural priors? Maybe some kind of training in original seeing?
Staying grounded and stable in spite of the stakes
Obviously, being one of the few hundred people on whom the whole future of the cosmos rests, while the singularity is happening around you, and you are confronted with the stark reality of how doomed we are, is scary and disorienting and destabilizing.
I imagine that that induces all kinds of psychological pressures, that might find release in any of a number of concerning outlets: by deluding one’s self about the situation or slipping sideways into a more convenient world, by becoming manic and frenetic, by sinking into immovable depression.
We need our people to have the virtue of being able to look the problem in the eye, with all of its terror and disorientation, and stay stable enough to make tough calls, and make them sanely.
We’re called to cultivate a virtue (or maybe a set of virtues) of which I don’t know the true name, but which involves courage and groundedness, and determination-without-denial.
I don’t know what is entailed in cultivating that virtue. Perhaps meditation? Maybe testing one’s self at literal risk to one’s life? I would guess that people in other times and places, who needed to face risk to their own lives and that of their families, and take action anyway, did have this virtue, or some part of it, and it might be fruitful to investigate those cultures and how that virtue was cultivated.
Any more ideas?
Answers
Thanks, this is a great thing to be thinking about and a good list of ideas!
Do other subjects come to mind?
Public speaking skills, persuasion skills, debate skills, etc. [LW · GW]
Practice no-cost-too-large productive periods
I like this idea. At AI Impacts we were discussing something similar: having "fire drills" where we spend a week (or even just a day) pretending that a certain scenario has happened, e.g. "DeepMind just announced they have a turing-test-passing system and will demo it a week from now; we've got two journalists asking us for interviews and need to prep for the emergency meeting with the AI safety community tonight at 5." We never got around to testing out such a drill but I think variants on this idea are worth exploring. Inspired by what you said, perhaps we could have "snap drills" where suddenly we take our goals for the next two months and imagine that they need to be accomplished in a week instead, and see how much we can do. (Additionally, ideas like this seem like they would have bonus effects on morale, teamwork, etc.)
I don’t know what is entailed in cultivating that virtue. Perhaps meditation? Maybe testing one’s self at literal risk to one’s life?
This virtue is extremely important to militaries. Does any military use meditation as part of its training? I would guess that the training given to medics and officers (soldiers for whom clear thinking is especially important) might have some relevant lessons. Then again, maybe the military deals with this primarily by selecting the right sort of people rather than taking arbitrary people and training them. If so, perhaps we should look into applying similar selection methods in our own organizations to identify people to put in charge when the time comes.
Any more ideas?
In this post [LW · GW] I discuss some:
Perhaps it would be good to have an Official List of all the AI safety strategies, so that whatever rationale people give for why this AI is safe can be compared to the list. (See this prototype list. [LW · GW])
Perhaps it would be good to have an Official List of all the AI safety problems, so that whatever rationale people give for why this AI is safe can be compared to the list, e.g. "OK, so how does it solve outer alignment? What about mesa-optimizers? What about the malignity of the universal prior? I see here that your design involves X; according to the Official List, that puts it at risk of developing problems Y and Z..." (See this prototype list. [LW · GW])
Perhaps it would be good to have various important concepts and arguments re-written with an audience of skeptical and impatient AI researchers in mind, rather than the current audience of friends and LessWrong readers.
Thinking afresh, here's another idea: I have a sketch of a blog post titled "What Failure Feels Like." The idea is to portray a scenario of doom in general, abstract terms (like Paul's post does, as opposed to writing a specific, detailed story) but with a focus on how it feels to us AI-risk-reducers, rather than focusing on what the world looks like in general or what's going on inside the AIs. I decided it would be depressing and not valuable to write. However, maybe it would be valuable as a thing people could read to help emotionally prepare/steel themselves for the time when they "are confronted with the stark reality of how doomed we are." IDK.
I guess overall my favorite idea is to just periodically spend time thinking about what you'd do if you found out that takeoff was happening soon. E.g. "Deepmind announces turing-test system" or "We learn of convincing roadmap to AGI involving only 3 OOMs more compute" or "China unveils project to spend +7 OOMs on a single training run by 2030, with lesser training runs along the way" I think that the exercise of thinking about near-term scenarios and then imagining what we'd do in response will be beneficial even on long timelines, but certainly super beneficial on short timelines (even if, as is likely, none of the scenarios we imagine come to pass).
↑ comment by juliawise · 2021-04-01T19:48:23.036Z · LW(p) · GW(p)
I was coming to say something similar [edited to add: about communication skills.]
I don't know much about this field, but one comparison that comes to mind is Ignaz Semmelweis who discovered that hand-cleaning prevented hospital deaths, but let his students write it up instead of trying to convince his colleagues more directly. The message got garbled, his colleagues thought he was a crank, and continental Europe's understanding of germ theory was delayed by 60 years as a result.
↑ comment by Nora_Ammann · 2021-04-01T10:09:41.509Z · LW(p) · GW(p)
Regarding "Staying grounded and stable in spite of the stakes":
I think it might be helpful to unpack the vritue/skill(s) involved according to the different timescales at which emergencies unfold.
For example:
1. At the time scale of minutes or hours, there is a virtue/skill of "staying level headed in a situation of accute crisis". This is the sort of skill you want your emergency doctor or firefighter to have. (When you pointed to the military, I think you in part pointed to this scale but I assume not only.)
From talking to people who do or did jobs like this, a typical pattern seems to be that some types of people when in siutations like this basically "freeze" and others basically move into a mode of "just functioning". There might be some margin for practice here (maybe you freeze the first time around and are able to snap out of the freeze the second time around, and after that, you can "remember" what it feels like to shift into funcitoning mode ever after) but, according to the "common wisdom" in these prfoessions (as I undestand it), mostly people seem to fall in one or the other category.
The sort of practice that I see being helpful here is a) overtraining on whatever skill you will need in the moment (e.g. imagine the emergency doctor) such that you can hand over most cognitive work to your autopilot once the emergency occurs; and b) train the skill of switching from freeze into high-functioning mode. I would expect "drill-type practices" are the most abt to get at that, but as noted above I don't know how large the margin for improvement is. (A subtlety here: there seems to be a massive difference between "being the first person to switch in to funcitoning mode", vs "switching into functioning mode after (literally or metaphorically speaking) someone screamed at your face to get moving". (Thinking of the military here.))
All that said, I don't feel particularly excited for people to start doing a bunch of drill practice or the like. I think there are possible extreme scenarios of "narrow hingy moments" that will involve this skill but overall this doesn't seem to me not to be the thing that is most needed/with highest EV.
(Probably also worth putting some sort of warning flag here: genuinly high-intensity situations can be harmful to people's psychy so one should be very cautious about experimenting with things in this space.)
2. Next, there might be a related virtue/skill at the timescale of weeks and months. I think the pandemic, especially from ~March to May/June is an excellent example of this, and was also an excellent learning opportunities for people involved in some time-sensitive covid-19 problem. I definitely think I've gained some gears on what a genuin (i.e. highly stakey) 1-3 month sprint involves, and what challenges and risks are invovled for you as an "agent" who is trying to also protect their agency/ability to think and act (though I think others have learnt and been stress-tested much more than I have).
Personally, my sense is that this is "harder" than the thing in 1., because you can't rely on your autopilot much, and this makes things feel more like an adaptive rather than technical problem (where the latter is aproblem where the solution is basically clear, you just have to do it; and the latter is a problem most of the work needed is in figuring out the solution, not so much (necessarily) in executing it.)
One difficulty is that this skill/virtue involves managing your energy not only spending it well. Knowing yourself and hoy your energy and motivation structures work - and in particular how they work in extreme scenarios - seems very important. I can see how people who have meditated a lot have gained valuable skills here. I don't think it's th eonly way to get these skills, and I expect the thing that is paying off here is more "being able to look back on years of meditaton practice and the ways this has rewired one's brain in some deep sense" rather than "benefits from having a routine to meditate" or something like this.
During the first couple of COVID-19 months, I was also surprised how "doing well at this" was more a question of collective rationality than I would have thought (by collective rationality I mean things like: ability to communciate effectively, ability to mobilise people/people with the right skills, abilty to delegate work effectively). There is still a large individual component of "staying on top of it all/keeping the horizon in sight" such that you are able to make hard decisoins (which you will be faced with en masse).
I think it could be really good to collect lessons learnt from the folks invovled in some EA/rationlaist-adjacent COVID-19 projects.
3. The scale of ~(a few) years seems quite similar in type to 2. The main thing that I'd want to add here is that the challenge of dealing with strong uncertainty while the stakes are massive can be very psychologically challenge. I do think meditation and related practices can be helpful in dealing with that in a way that is both grounded and not flinching from the truth.
I find myself wondering whether the miliatry does anything to help soldiers prepare for the act of "going to war" where the posisbility of death is extremely real. I imaigne they must do things to support people in this process. It's not exactly the same but there certainly are parallels with what we want.
↑ comment by Kaj_Sotala · 2021-04-01T10:49:10.637Z · LW(p) · GW(p)
Does any military use meditation as part of its training?
. Yes, e.g.
Replies from: daniel-kokotajloThis [2019] winter, Army infantry soldiers at Schofield Barracks in Hawaii began using mindfulness to improve shooting skills — for instance, focusing on when to pull the trigger amid chaos to avoid unnecessary civilian harm.
The British Royal Navy has given mindfulness training to officers, and military leaders are rolling it out in the Army and Royal Air Force for some officers and enlisted soldiers. The New Zealand Defence Force recently adopted the technique, and military forces of the Netherlands are considering the idea, too.
This week, NATO plans to hold a two-day symposium in Berlin to discuss the evidence behind the use of mindfulness in the military.
A small but growing group of military officials support the techniques to heal trauma-stressed veterans, make command decisions and help soldiers in chaotic battles.
“I was asked recently if my soldiers call me General Moonbeam,” said Maj. Gen. Piatt, who was director of operations for the Army and now commands its 10th Mountain Division. “There’s a stereotype this makes you soft. No, it brings you on point.”
The approach, he said, is based on the work of Amishi Jha, an associate professor of psychology at the University of Miami. She is the senior author of a paper published in December about the training’s effectiveness among members of a special operations unit.
The paper, in the journal Progress in Brain Research, reported that the troops who went through a monthlong training regimen that included daily practice in mindful breathing and focus techniques were better able to discern key information under chaotic circumstances and experienced increases in working memory function. The soldiers also reported making fewer cognitive errors than service members who did not use mindfulness.
The findings, which build on previous research showing improvements among soldiers and professional football players trained in mindfulness, are significant in part because members of the special forces are already selected for their ability to focus. The fact that even they saw improvement speaks to the power of the training, Dr. Jha said. [...]
Mr. Boughton has thought about whether mindfulness is anathema to conflict. “The purists would say that mindfulness was never developed for war purpose,” he said.
What he means is that mindfulness is often associated with peacefulness. But, he added, the idea is to be as faithful to compassionate and humane ideals as possible given the realities of the job.
Maj. Gen. Piatt underscored that point, describing one delicate diplomatic mission in Iraq that involved meeting with a local tribal leader. Before the session, he said, he meditated in front of a palm tree, and found himself extremely focused when the delicate conversation took place shortly thereafter.
“I was not taking notes. I remember every word she was saying. I wasn’t forming a response, just listening,” he said. When the tribal leader finished, he said, “I talked back to her about every single point, had to concede on some. I remember the expression on her face: This is someone we can work with.”
↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-04-01T11:09:52.215Z · LW(p) · GW(p)
Hmmm, if this is the most it's been done, then that counts as a No in my book. I was thinking something like "Ah yes, the Viet Cong did this for most of the war, and it's now standard in both the Vietnamese and Chinese armies." Or at least "Some military somewhere has officially decided that this is a good idea and they've rolled it out across a large portion of their force."
I speculate (based on personal glimpses, not based on any stable thing I can point to) that there's many small sets of people (say of size 2-4) who could greatly increase their total output given some preconditions, unknown to me, that unlock a sort of hivemind. Some of the preconditions include various kinds of trust, of common knowledge of shared goals, and of person-specific interface skill (like speaking each other's languages, common knowledge of tactics for resolving ambiguity, etc.).
[ETA: which, if true, would be good to have already set up before crunch time.]
One of the biggest considerations would be the process for activating "crunch time". In what situations should crunch time be declared? Who decides? How far out would we want to activate and would there be different levels? Are there any downsides of such a process including unwanted attention?
If these aren't discussed in advance, then I imagine that far too much of the available time could be taken up by whether to activate crunch time protocols or not.
PS. I actually proposed here that we might be able to get a superintelligence to solve most of the problem of embedded agency by itself. I'll try to write it up into a proper post soon.
There are probably a class of people for whom working on AI alignment is not worth it/optimal/their concern before crunch time, but becomes their main focus once crunch time is officially declared. Something akin to sleeper agents, if you will.
There should be a network ready to tap on these people's assets/skills when the signal is launched.
↑ comment by Chris_Leong · 2021-04-05T03:39:05.154Z · LW(p) · GW(p)
Strongly agree. There's lots of people who might not seem worth funding at the moment, but when it comes to crunch time, EA should be prepared to burn money.
Future productivity tools would include text/workflow analysis. This goes better with more data, so you could record all you do, be it via screen recording, a time tracker program, or a keylogger. In particular, if what you do all day is think, write down your stream of consciousness if your thoughts look like a stream of words and your fingers can keep up. GPT-4 might well look back over your diary and tell you the paper you missed.
Is there an open-source lifelogging app, so privacy-conscious people's data isn't lost to the void?
My productivity appeared to go up by an order of magnitude a month ago when I started Vyvanse. Switching from playing video games all day to doing math all day went from a sense of "that ought to be possible somehow, can't the doctor just wave a wand" to effortless. (Although I still procrastinate on bureaucratic obligements.) Know your amphetamines, talk to your doctor. If we take folk wisdom into account and model a chance of polymorphing into a junkie on the streets or a hippie in the church, I'm not pulling Merlin's name on you to shut up and calculate, because I am not that wise, nor that comparatively advantaged to unilaterally command the community. But I am telling you to calculate whether you are in my previous position, then listen to the more conservative of your calculations and your gut.
If the crunch happens, wouldn't you profit from setting it up like a company? I guess if AI Alignment ends up a crisis that humanity only deals with last minute, but then with full effort, the funding will be there.
Get most of the people in the same office (relocate if necessary), have a company structure that is proven to work (hierarchies where necessary but not too much), get momentum going so you motivate each other to work longer and more productively.
Another consideration is that a lot of people that weren't doing AI Safety but were operating in a similar space might join the effort if the crunch happens, like Biontech developing the Covid vaccine although they were working on cancer before that.
This includes all big tech companies, so in case of a crunch the space of people doing Alignment Work might look very different.
16 comments
Comments sorted by top scores.
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-04-01T02:06:06.943Z · LW(p) · GW(p)
Seems rather obvious to me that the sort of person who is like, "Oh, well, we can't possibly work on this until later" will, come Later, be like, "Oh, well, it's too late to start doing basic research now, we'll have to work with whatever basic strategies we came up with already."
Replies from: Raemoncomment by TurnTrout · 2021-03-30T14:50:08.923Z · LW(p) · GW(p)
For this to matter, our alignment researchers need to be at the cutting edge of AI capabilities, and they need to be positioned such that their work can actually be incorporated into AI systems as they are deployed.
If we become aware that a lab will likely deploy TAI soon, other informed actors will probably become aware as well. This implies that many people would be trying to influence and gain access to this lab. Therefore, we should already have AI alignment researchers in positions of power within the lab before this happens.
Replies from: elityre↑ comment by Eli Tyre (elityre) · 2021-03-31T08:07:40.716Z · LW(p) · GW(p)
Strong agree.
comment by DanielFilan · 2021-03-31T23:53:28.376Z · LW(p) · GW(p)
Most current AI alignment work is pretty abstract and theoretical, for two reasons.
FWIW, this is not obvious to me (or at least depends a lot on what you mean by 'AI alignment'). Work at places like OpenAI, CHAI, and DeepMind tends to be relatively concrete.
Replies from: DanielFilan↑ comment by DanielFilan · 2021-03-31T23:58:28.833Z · LW(p) · GW(p)
Also if you count work done by people not publicly identified as motivated by existential risk, I think the concrete:abstract ratio will increase.
comment by Raemon · 2021-04-01T05:30:40.305Z · LW(p) · GW(p)
Curated.
I found this a surprisingly obvious set of strategic considerations (and meta-considerations), that for some reason I'd never seen anyone actually attempt to tackle before.
I found the notion of practicing "no cost too large" periods quite interesting. I'm somewhat intimidated by the prospect of trying it out, but it does seem like a good idea.
comment by Eli Tyre (elityre) · 2022-12-12T14:52:34.075Z · LW(p) · GW(p)
Since this got nominated, now's a good time to jump in and note that I wish that I had chosen different terminology for this post.
I was intending for "final crunch time" to be a riff on Eliezer saying, here [LW · GW], that we are currently in crunch time.
This is crunch time for the whole human species, and not just for us but for the intergalactic civilization whose existence depends on us. This is the hour before the final exam and we're trying to get as much studying done as possible.
I said explicitly, in this post, "I'm going to refer to this last stretch of a few months to a few years, 'final crunch time', as distinct from just 'crunch time', ie this century."
But predictably, in retrospect, that one sentence didn't stick with people when recalling the post later, and the "final" in "final crunch time" gets dropped.
I would have preferred to preserve the resonance of Eliezer's original point, that right now is crunch time, when we're trying to prepare as best we can for our pass-fail test as a species, and I think that point gets eaten by my choice of terminology.
I'm inclined to go back and rewrite this post, as "How do we prepare for the AI endgame?"
(That terminology is better at not clashing with Eliezer's original point, though I also think that it is somewhat less evocative of the right thing. "Endgame" gives the impression of "the final, last steps of a grand strategy, cooperating and competing with other strategic actors on the gameboard". "Crunch time" gives the impression of "frantcially trying to make sense of what's happening in the hope of hacking together a passable solution." Which I think is closer to the spirit of what we should be preparing for.)
If I do that rewrite, it seems like doing it before this essay gets packaged into a book seems ideal.
But maybe it's too late, and I changing terminology after people have been exposed to it is counterproductive?
Do people have thoughts?
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2022-12-12T18:01:46.000Z · LW(p) · GW(p)
I think that if we were in crunch time in 2010, your phrasing is fine, because we're in final crunch time now. If you have an alarm, please ring it. Though, also make sure to mention that coprotective safety is looking tractable and likely to succeed if we try! despite the drawbacks, the diplomacy ai gave me a lot of hope that we can solve the hard cooperation problem.
comment by johnswentworth · 2021-03-30T19:19:25.112Z · LW(p) · GW(p)
Relevant topic of a future post: some of the ideas from Risks From Learned Optimization [? · GW] or the Improved Good Regulator Theorem [LW · GW] offer insights into building effective institutions and developing flexible problem-solving capacity.
Rough intuitive idea: intelligence/agency are about generalizable problem-solving capability. How do you incentivize generalizable problem-solving capability? Ask the system to solve a wide variety of problems, or a problem general enough to encompass a wide variety.
If you want an organization to act agenty, then a useful technique is to constantly force the organization to solve new, qualitatively different problems. An organization in a highly volatile market subject to lots of shocks or distribution shifts will likely develop some degree of agency naturally.
Organizations with an adversary (e.g. traders in the financial markets) will likely develop some degree of agency naturally, as their adversary frequently adopts new methods to counter the organization's current strategy. Red teams are a good way to simulate this without a natural adversary.
Some organizations need to solve a sufficiently-broad range of problems as part of their original core business that they develop some degree of agency in the process. These organizations then find it relatively easy to expand into new lines of business. Amazon is a good example.
Conversely, businesses in stable industries facing little variability will end up with little agency. They can't solve new problems efficiently, and will likely be wiped out if there's a large shock or distribution shift in the market. They won't be good at expanding or pivoting into new lines of business. They'll tend to be adaptation-executors rather than profit-maximizers, to a much greater extent than agenty businesses.
This all also applies at a personal level: if you want to develop general problem-solving capability, then tackle a wide variety of problems. Try problems in many different fields. Try problems with an adversary. Try different kinds of problems, or problems with different levels of difficulty. Don't just try to guess which skills or tools generalize well, go out and find out which skills or tools generalize well.
If we don't know what to expect from future alignment problems, then developing problem-solving skills and organizations which generalize well is a natural strategy.
comment by jungofthewon · 2021-04-01T14:55:31.097Z · LW(p) · GW(p)
Access
Alignment-focused policymakers / policy researchers should also be in positions of influence.
Knowledge
I'd add a bunch of human / social topics to your list e.g.
- Policy
- Every relevant historical precedent
- Crisis management / global logistical coordination / negotiation
- Psychology / media / marketing
- Forecasting
Research methodology / Scientific “rationality,” Productivity, Tools
I'd be really excited to have people use Elicit with this motivation. (More context here [LW(p) · GW(p)] and here.)
Re: competitive games of introducing new tools, we did an internal speed Elicit vs. Google test to see which tool was more efficient for finding answers or mapping out a new domain in 5 minutes. We're broadly excited to structure and support competitive knowledge work and optimize research this way.
comment by johnswentworth · 2021-03-30T19:21:11.849Z · LW(p) · GW(p)
Re: picking up new tools, skills and practice designing and building user interfaces, especially to complex or not-very-transparent systems, would be very-high-leverage if the tool-adoption step is rate-limiting.
Replies from: elityre↑ comment by Eli Tyre (elityre) · 2021-03-31T08:07:23.073Z · LW(p) · GW(p)
I suspect that it becomes more and more rate limiting as technological progress speeds up.
Like, to a first approximation, I think there's a fixed cost to learning to use and take full advantage of a new tool. Let's say that cost if a few weeks of experimentation and tinkering. If importantly new tools are are invented on a cadence of once ever 3 years, that fixed cost is negligible. But if importantly new tools are dropping every week, the fixed cost becomes much more of a big deal.
comment by Donald Hobson (donald-hobson) · 2021-04-07T21:55:49.724Z · LW(p) · GW(p)
I don't actually think "It is really hard to know what sorts of AI alignment work are good this far out from transformative AI." is very helpful.
It is currently fairly hard to tell what is good alignment work. A week from TAI, then either, good alignment work will be easier to recognise because of alignment progress not strongly correlated with capabilities, or good alignment research is just as hard to recognise. (More likely the latter) I can't think of any safety research that can be done on GPT3 that can't be done on GPT1.
In my picture, research gets done and theorems proved, researcher population grows as funding increases and talent matures. Toy models get produced. Once you can easily write down a description of a FAI with unbounded compute, that's when you start to look at algorithms that have good capabilities in practice.
comment by Nicole Ross (nicole-ross) · 2021-04-01T14:23:28.253Z · LW(p) · GW(p)
I found this very helpful and motivating to read — it feels like this made clear and specific some things I had more hazily been thinking about. Thanks for writing it up.
Replies from: elityre↑ comment by Eli Tyre (elityre) · 2021-06-11T00:20:12.369Z · LW(p) · GW(p)
Glad to help.