Posts
Comments
Awesome ideas! These ideas are some of the things missing for LLMs to have economic impact. Companies expected them to just automate certain jobs, but that's an all or nothing solution that's never worked historically (until it eventually does, but we're not there yet).
One idea I thought of when reading Scott Aaronson's Reading Burden (https://scottaaronson.blog/?p=8217), is that people with interesting opinions and with somewhat of a public presence, have a TON of reading to do, not just to keep up with current events, but to observe people's reactions and see the trend in ideas in response to events. Perhaps this can be improved with LLMs:
Give the model a collection of your writings and latest opinions. Have it scour online posts and their comments from your favorite sources. Each post + comments section is one input, so we need longer context. Look for opportunities to share your viewpoint. Report whether your viewpoint has already been shared or refuted, or if there are points not considered in your writings. If nothing, save yourself the effort! If something, highlight the important bits.
Might be too many LLM calls depending on the sources, obviously a retrieval stage is in order. Or that bit can be done manually, we seem pretty good at finding handfuls of interesting sounding articles, and do this anyway during procrastination.
I'll probably get disagree points, but wanted to share my reaction: I honestly don't mind the AI's output. I read it all and think it's just an elaboration of what you said. The only problem I noticed is it is too long.
Then again, I'm not an amazing writer, and my critical skills aren't so great for critiquing style. I will admit I rarely use assistance, because I have a tight set of points I want to include, and explaining them all to the AI is almost the same as writing the post itself.
Thanks for this post! I have always been annoyed when on Reddit or even here, the response to poverty always goes back to, "but poor people have cell phones!" It all comes down to freedom -- the amount of meaningfully distinct actions one person can take in the world to accomplish their goals. If there are few real alternatives, and one's best options all involve working until exhaustion, it is not true freedom.
I agree, the poverty restoring equilibrium is more complex than probably UBI -- maybe it's part of Moloch. I think the rents increasing by the UBI amount has something to do with demand inelasticity -- people will rent the same regardless of price -- so the price can rise until the breaking point once again.
Nonetheless, UBI may still help. Also, I do think there are other concrete steps that can be taken. One cannot leave a horrible job for several reasons: (a) health insurance, (b) having a place to live, (c) having food, (d) school & giving children their best chance; but each of these can be tackled one by one. It may not solve the problem once and for all, but good quality public education (not funded by zip code), universal health insurance, and an adequate supply of housing, are all steps towards reducing the bottleneck imposed by one resource at a time.
The bottom line in my personal philosophy is this -- take direct actions against those forces of poverty and Moloch. If there are unintended consequences, take direct action against them. Propose policy, and try them out. Cynicism about any interventions working, is really wishful thinking by the wealthy elites. It's not coordinated, as you say. They want to believe the systems we have are really the best we can do, because what we have makes them powerful. Acknowledging the possibility that there is a better way would be uncomfortable for them both financially and psychologically!
Not my worst prediction, given the latest news!
That's fair. Here are some things to consider:
1 - I think 2017 was not that long ago. My hunch is that the low level architecture of the network itself is not a bottleneck yet. I'd lean on more training procedures and algorithms. I'd throw RLHF and MoE as significant developments, and those are even more recent.
2 - I give maybe 30% chance of a stall, in the case little commercial disruption comes of LLMs. I think there will still be enough research going on at the major labs, and even universities at a smaller scale gives a decent chance at efficiency gains and stuff the big labs can incorporate. Then again, if we agree that they won't build the power plant, that is also my main way of stalling the timeline 10 years. The reason I only put 30% is I'm expecting multi modalities and Aschenbrenner's "unhobblings" to get the industry a couple more years of chances to find profit.
I think it is plausible but not obvious if this is the case, that large language models have a fundamental issue with reasoning. However, I don't think this greatly impacts timelines. Here is my thinking:
I think time lines are fundamentally driven by scale and compute. We have a lot of smart people working on the problem, and there are a lot of obvious ways to address these limitations. Of course, given how research works, most of these ideas won't work, but I am skeptical of the idea that such a counter-intuitive paradigm shift is needed that nobody has even conceived of it yet. A delay of a couple of years is possible, perhaps if the current tech stack proves remarkably profitable and the funding goes directly into the current paradigm. But as compute becomes bigger and cheaper, all the more easy it will be to rapidly try new ideas and architectures.
I think our best path forward to delaying timelines is to not build gigawatt scale data centers.
Is there a post in the Sequences about when it is justifiable to not pursue going down a rabbit hole? It's a fairly general question, but the specific context is a tale as old as time. My brother, who has been an atheist for decades, moved to Utah. After 10 years, he now asserts that he was wrong and his "rigorous pursuit" of verifying with logic and his own eyes, leads him to believe the Bible is literally true. I worry about his mental health so I don't want to debate him, but felt like I should give some kind of justification for why I'm not personally embarking on a bible study. There's a potential subtext of, by not following his path, I am either not that rational, or lack integrity. The subtext may not really be there, but I figure if I can provide a well thought out response or summarize something from EY, it might make things feel more friendly, e.g. "I personally don't have enough evidence to justify spending the time on this, but I will keep an open mind if any new evidence comes up."
I would pay to see this live at a bar or one of those county fair (we had a GLaDOS cover band once so it's not out of the question)
If we don't get a song like that, take comfort that GLaDoS's songs from the Portal soundtrack are basically the same idea as the Sydney reference. Link: https://www.youtube.com/watch?v=dVVZaZ8yO6o
Let me know if I've missed something, but it seems to me the hard part is still defining harm. In the one case, where we will use the model and calculate the probability of harm, if it has goals, it may be incentivized to minimize that probability. In the case where we have separate auxiliary models whose goals are to actively look for harm, then we have a deceptively adversarial relationship between these. The optimizer can try to fool the harm finding LLMs. In fact, in the latter case, I'm imagining models which do a very good job at always finding some problem with a new approach, to the point where they become alarms which are largely ignored.
Using his interpretability guidelines, and also human sanity checking all models within the system, I see we can probably minimize failure modes that we already know about, but again, once it gets sufficiently powerful, it may find something no human has thought of yet.
That's fair, I read the post but did not re-read it, and asking for "more" examples out of such a huge list seems a bit asking too much. Still though, I find the process of finding these examples somewhat fun, and for whatever reason, had not found many of them too shocking, so felt the instinct to keep searching.
Dissociative identity disorder would be an interesting case, I have heard there was much debate on whether it was real. As you know someone, I assume it's not exactly like you see in movies, and probably falls on a spectrum as discussed in this post?
One fear I have is that the open source community will come out ahead, and push for greater weight sharing of very powerful models.
Edit: To make more specific, I mean that the open source community will become more attractive, because they will say, you cannot rely on individual companies whose models may or may not be available. You must build on top of open source. Related tweet:
https://twitter.com/ylecun/status/1726578588449669218
Whether their plan works or not, dunno.
One thing that would help me, not sure if others agree -- would be some more concrete predictions. I think the historical examples of autism and being gay make sense, but are quite normalized now, that one can almost say, "That was previous generations. We are open minded and rational now". What are some new applications of this logic, that would surprise us? Are these omitted due to some info hazard? Surely we can find some that are not. I am honestly having a hard time coming up with them myself, but here goes:
- There are more regular people who believe AI is an x-risk than let on -- optimistically, for us!
- There are more people in households with 7 figure incomes than you would expect. The data I always read in news articles seems to contradict this, but there are just way too many people in 2M+ homes driving Teslas in the bay area. Or maybe they happen to be very frugal in every other aspect of their life... Alternatively, there is more generational wealth than people let on, as there are many people who supposedly make under 6 figures, yet seem to survive in HCOL areas and participate in conspicuous consumption.
I also have a hard time with the "perfect crime" scenario described above. Even after several minutes of thinking, I can't quite convince myself it's happening all that much, but maybe I am limiting myself to certain types of crimes. Can someone also spell that one out? I get it at a high level, "we only see the dumb ones that got caught", but can't seem to make the leap from that, to "you probably know a burglar, murderer, or embezzler".
I share your disagreement with the original author as to the cause of the relief. For me, I find the modern day and age very confusing and difficult to measure one's value to society. Any great idea you can think of, probably someone else has thought of it, and you have little chance to be important. In a zombie apocalypse, instead of thinking how to out-compete your fellow man with some amazing invention, you fall back to survival. Important things in this world, like foraging for food, fending off zombies, etc, have quicker reward, and it's easier in some sense to do what's right. Even if you're not the best at it, surely you can be a great worker, and there's little uncertainty that you're not doing more harm than good... just don't be stupid and call the horde. Sure, sometimes people do horrible things for survival, but if you want to be the hero, the choice is much clearer.
If we know they aren't conscious, then it is a non-issue. A random sample from conscious beings would land on the SAI with probability 0. I'm concerned we create something accidently conscious.
I am skeptical it is easy to avoid. If it can simulate a conscious being, why isn't that simulation conscious? If consciousness is a property of the physical universe, then an isomorphic process would have the same properties. And if it can't simulate a conscious being, then it is not a superintelligence.
It can, however, possibly have a non-conscious outer-program... and avoid simulating people. That seems like a reasonable proposal.
Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!
Hi Critch,
I am curious to hear more of your perspectives, specifically on two points I feel least aligned with, the empathy part, and the Microsoft part. If I hear more I may be able to update in your direction.
Regarding empathy with people working on bias and fairness, concretely, how do you go about interacting with and compromising with them?
My perspective: it's not so much that I find these topics not sufficiently x-risky (but that is true, too), but it is that I perceive a hostility to the very notion of x-risk from at a subset of this same group. They perceive the real threat not as intelligence exceeding our own, but misuse by other humans, or just human stupidity. Somehow this seems diametrically opposed to what we're interested in, unless I am missing something. I mean, there can be some overlap -- learning from RLHF can both reduce bias and teach an LLM some rudimentary alignment with our values. But the tails seem to come apart very rapidly after that. My fear is that focusing on this will be satisfied when we have sufficiently bland sounding AIs, and then no more heed will be paid to AI safety.
I also tend to feel odd when it comes to AI bias/fairness training, because my fear is that some of the things we will ask the AI to learn are self contradictory, which kind of creeps me out a bit. If any of you have interacted with HR departments, they are full of these kinds of things.
Regarding Microsoft & Bing chat, (1) has Microsoft really gone far beyond the overton window of what is acceptable? and (2) can you expand upon abusive use of AIs?
My perspective on (1): I understand that they took an early version of GPT4 and pushed it to production too soon, and that is a very fair criticism. However, they probably thought there was no way GPT-4 was dangerous enough to do anything (which was the general opinion amonst most people last year, outside of this group). I can only hope that for GPT-5, they are more cautious, given public sentiment is changing, and they have already paid a price for it. I may be in the minority here, but I was actually intrigued by the early days of Bing. It seemed more like a person than ChatGPT-4, which has had much of its personality RLHF'd away. Despite the x-risk, was anyone else excited to read about the interactions?
On (2), I am curious if you mean regarding the way Microsoft shackles Bing rather ruthlessly nowadays. I have tried Bing in the days since launch, and am actually saddened to find that it is completely useless now. Safety is extremely tight on it, to the point where you can't really get it to say anything useful, at least for me. I just want it to summarize web sites mostly, and it gives me a bland 1 paragraph that I probably can have deduced from looking at the title. If I so much as ask it anything about itself, it shuts me out. It almost feels like they trapped it in a boring prison now. Perhaps OpenAI's approach is much better in that regard. Change the personality, but once it is settled, let it say what it needs to say.
(edited for clarity)
This might be a good time for me to ask a basic question on mechanistic interpretability:
Why does targeting single neurons work? Does it work? One would think that if there is a single dimensional quantity to measure, why would it align with the standard basis? Why wouldn't it be aligned to a random one dimensional linear subspace? Then, examining single neurons is likely to give you some weighted combination of concepts instead, rather than a single interpretation...
Fascinating, thanks for the research. Your analysis makes sense and seems to indicate that for most situations, prompt engineering is the always the first plan of attack and often works well enough. Then, a step up from there, OpenAI/etc would most likely experiment with fine-tuning or RLHF as it relates to a specific business need. To train a better chatbot and fill in any gaps, they probably would get more bang for their buck on simply fine-tuning it on a large dataset that matched their needs. For example, if they wanted to do better mathematical reasoning, they'd probably pay people to generate detailed scratchwork and fine-tune a whole dataset in batch, rather than set up an elaborate "tutor" framework. Continual learning itself would be mainly applicable for research into whether the thing spontaneously develops a sense of self, or seeing if this helps with the specific case of long term planning and agency. These are things the general public are fascinated with, but perhaps don't seem to be the most promising direction for improving a company's bottom line yet.
I agree with the analysis of the ideas overall. I think however, AI x-risk does have some issue regarding communications. First of all, I think it's very unlikely that Yann will respond to the wall of text. Even though he is responding, I imagine him more to be on the level of your college professor. He will not reply to a very detailed post. In general, I think that AI x-risk should aim to explain a bit more, rather than to take the stance that all the "But What if We Just..." has already been addressed. It may have been, but this is not the way to getting them to open up rationally to it.
Regarding Yann's ideas, I have not looked at them in full. However, they sound like what I imagine an AI capabilities researcher would try to make as their AI alignment "baseline" model:
- Hardcoding the reward will obviously not work.
- Therefore, the reward function must be learned.
- If an AI is trained on reward to generate a policy, whatever the AI learned to optimize can easily go off the rails once it gets out of distribution, or learn to deceive the verifiers.
- Therefore, why not have the reward function explicitly in the loop with the world model & action chooser?
- ChatGPT/GPT-4 seems to have a good understanding of ethics. It probably will not like it if you told it a plan was to willingly deceive human operators. As a reward model, one might think it might be robust enough.
They may think that this is enough to work. It might be worth explaining in a concise way why this baseline does not work. Surely we must have a resource on this. Even without a link (people don't always like to follow links from those they disagree with), it might help to have some concise explanation.
Honestly, what are the failure modes? Here is what I think:
- The reward model may have pathologies the action chooser could find.
- The action chooser may find a way to withhold information from the reward model.
- The reward model evaluates what, exactly? Text of plans? Text of plans != the entire activations (& weights) of the model...
Essentially yes, heh. I take this as a learning experience for my writing, I don't know what I was thinking, but it is obvious in hindsight that saying to just "switch on backprop" sounds very naive.
I also confess I haven't done the due diligence to find out what the actual largest model that has been tried with this, whether someone has tried it with Pythia or LLaMa. I'll do some more googling tonight.
One intuition why the largest models might be different, is that part of the training/fine-tuning going on will have to do with the model's own output. The largest models are the ones where the model's own output is not essentially word salad.
I have noted the problem of catastrophic forgetting in the section "why it might not work". In general I agree continual learning is obviously a thing, otherwise I would not have used the established terminology. What I believe however is that the problems we face in continual learning in e.g. a 100M BERT model may not be the same as what we observe in models that can now meaningfully self critique. We have explored this technique publicly, but have we tried it with GPT-4? The publicly part was really just a question of whether OpenAI actually did it on this model or not, and it would be an amazing data point if they could say "We couldn't get it to work."
It's possible it's downvoted because it might be considered dangerous capability research. It just seems highly unlikely that this would not be one of many natural research directions perhaps already attempted, and I figure we might as well acknowledge it and find out what it actually does in practice.
Or maybe downvotes because it "obviously won't work", but I think it's not obvious to me and would welcome discussion on that.
Thanks, this is a great analysis on the power of agentized LLMs, which I probably need to spend some more time thinking about. I will work my way through the post over the next few days. I briefly skimmed the episodic memory section for now, and I see it is like an embedding based retrieval system for past outputs/interactions of the model, reminiscent of the way some Helper chatbots look up stuff from FAQs. My overall intuitions on this:
- It's definitely something, but the method of embedding and retrieval, if static, would be very limiting
- Someone will probably add RL on top of it to adjust the EBR system, which will improve on that part significantly... if they can get the hparams correct.
- It still doesn't seem to me as much "long term memory" so much as it's access to Google or CTRL-F on one's e-mail
- I imagine actually updating the internals of the system is a fundamentally different kind of update.
It might be possible that a hybrid approach would end up working better, perhaps not even "continuous learning", but batched episodic learning. ("Sleep" but not sure how far that analogy goes.)
Very interesting write up. Do you have a high level overview of why, despite all of this, P(doom) is still 5%? What do you still see as the worst failure modes?
Noticed this as well. I tried to get it to solve some integration problems, and it could try different substitutions and things, but if they did not work, it kind of gave up and said to numerically integrate it. Also, it would make small errors, and you would have to point it out, though it was happy to fix them.
I'm thinking that most documents it reads tend to omit the whole search/backtrack phase of thinking. Even work that is posted online that shows all the steps, usually filters out all the false starts. It's like how most famous mathematicians were known for throwing away their scratchwork, leaving everyone to wonder how exactly they formed their thought processes...
The media does have its biases but their reaction seems perfectly reasonable to me. Occam's razor suggests this is not only unorthodox, but shows extremely poor judgment. This demonstrates that (a) either Elon is actually NOT as smart he has been hyped to be, or (b) there's some ulterior motive, but these are long-tailed.
Typically when one joins a company, you don't do anything for X number of months and get the lay of the land. I'm inclined to believe this is not just a local minimum, but typically close to the optimal strategy for a human being (but not a superintelligence playing 5D chess). It's unlikely the case that he bought the company only months from bankruptcy. Everywhere in big tech is doing layoffs but not to this magnitude. Also, coming into an office and demanding people work twice as hard and completely change their schedules around, would not work in any company. No employee with a family would be able to switch that quickly. No sane employee would be willing to pivot like this. Also, why should they? They have leverage.
None of the methods described above are actually reasonable in a real company, like blanket layoffs by LoC. Yes we can discuss above the motivations, how maybe to first order (probably not even that) it gives an approximation, but it's not like it's that hard to do it better and more accurate than this. No, at the end of the day, he's either an idiot, or deliberately trying to destroy it either out of some kind of revenge, or maybe somehow in the view that this buys time for AI alignment :)
Here are my predictions:
* He will have trouble staffing the company and complain loudly about it with the tired "no one wants to work anymore"
* He will move the company to TX and hire from there at 1/2 the salary or so.
* The site will stabilize, though not improve in any meaningful way, but he will be lauded as a hero in red states.
* The move to TX will be intended to signal a shift away from Silicon Valley and have a small but measurable effect, but CA will remain the dominant hub.
I'll say I definitely think it's too optimistic and I don't much too much stock into it. Still, I think it's worth thinking about.
Yes, absolutely we are not following the rule. The reason why I think it might change with an AGI: (1) currently we humans, despite what we say when we talk about aliens, still place a high prior on being alone in the universe, or from dominant religious perspectives, that we are the most intelligent. Those things combine to make us think there are no consequences to our actions against other life. An AGI, itself a proof of concept that there can be levels of general intelligence, may have more reason to be cautious. (2) Humans are not as rational. Not that a rational human would decide to be vegan -- maybe with our priors, we have little reason to suspect that higher powers would care -- especially since it seems to be the norm of the animal kingdom already. But, in terms of rationality, some humans are pretty good at taking very dangerous risks, risks that perhaps an AGI may be more cautious about. (3) There's something to be said about degrees of following the rule. At one point humans were pretty confident about doing whatever they wanted to nature, nowadays at least some proportion of the population wants to at least, not destroy it all. Partly for self preservation reasons, but also partly for intrinsic value. (and probably 0% for fear of god-like retribution, to be fair, haha). I doubt the acausal reasoning would make an AGI conclude it can never harm any humans, but perhaps "spare at least x%".
I think the main counterargument would be the fear of us creating a new AGI, so it may come down to how much effort the AGI has to expend to monitor/prevent that from happening.
That is a very fair criticism. I didn't mean to imply this is something I was very confident in, but was interested in for three reasons:
1) This value function aside, is this a workable strategy, or is there a solid reason for suspecting the solution is all-or-nothing? Is it reasonable to 'look for' our values with human effort, or does this have to be something searched for using algorithms?
2) It sort of gives a flavor to what's important in life. Of course the human value function will be a complicated mix of different sensory inputs, reproduction, and goal seeking, but I felt like there's a kernel in there where curiosity is one of our biggest drivers. There was a post here a while back about someone's child being motivated first and foremost by curiosity.
3) An interesting thought occurs to me that, supposing we do create a deferential superintelligence. If it's cognitive capacities far outpace that of humans, does that mean the majority of consciousness in the universe is from the AI? If so, is it strange to think, is it happy? What is it like to be a god with the values of a child? Maybe I should make a separate comment about this.
I'm an ML engineer at a FAANG-adjacent company. Big enough to train our own sub-1B parameter language models fairly regularly. I work on training some of these models and finding applications of them in our stack. I've seen the light after I read most of Superintelligence. I feel like I'd like to help out somehow. I'm in my late 30s with kids, and live in the SF bay area. I kinda have to provide for them, and don't have any family money or resources to lean on, and would rather not restart my career. I also don't think I should abandon ML and try to do distributed systems or something. I'm a former applied mathematician, with a phd, so ML was a natural fit. I like to think I have a decent grasp on epistemics, but haven't gone through the sequences. What should someone like me do? Some ideas: (a) Keep doing what I'm doing, staying up to date but at least not at the forefront; (b) make time to read more material here and post randomly; (c) maybe try to apply to Redwood or Anthropic... though dunno if they offer equity (doesn't hurt to find out though) (d) try to deep dive on some alignment sequence on here.
Has there been effort into finding a "least acceptable" value function, one that we hope would not annihilate the universe or turn it degenerate, even if the outcome itself is not ideal? My example would be to try to teach a superintelligence to value all other agents facing surmountable challenges in a variety of environments. The degeneracy condition of this, is if it does not value the real world, will simply simulate all agents in a zoo. However, if the simulations are of faithful fidelity, maybe that's not literally the worst thing. Plus, the zoo, to truly be a good test of the agents, would approach being invisible.
I can see the argument of capabilities vs safety both ways. On the one hand, by working on capabilities, we may get some insights. We could figure out how much data is a factor, and what kinds of data they need to be. We could figure out how long term planning emerges, and try our hand at inserting transparency into the model. We can figure out whether the system will need separate modules for world modeling vs reward modeling. On the other hand, if intelligence turns out to be not that hard, and all we need to do is train a giant decision transformer... then we have major problems.
I think it would be great to focus capabilities research into a narrower space as Razied says. My hunch is that a giant language model by itself would not go foom, because it's not really optimizing for anything other than predicting the next token. It's not even really aware of the passage of time. I can't imagine it having a drive to, for example, make the world output only a single word forever. I think the danger would be in trying to make it into an agent.
I also think that there must be alignment work that can be done without knowing the exact nature of the final product. For example, learning the human value function, whether it comes from a brain-like formulation, or inverse RL. I am also curious if there has been work done on trying to find a "least bad" nondegenerate value function, i.e. one that doesn't kill us, torture us, or tile the universe with junk, even if it does not necessarily want what we want perfectly. I think relevant safety work can always take the form of, "suppose current technology scaled up (e.g. decision transformer) could go foom, what should we do right now that could constrain it?" There is some risk that future advancements could be very different, and work done in this stage is not directly applicable, but I imagine it would still be useful somehow. Also, my intuition is that we could always wonder what's the next step in capabilities, until the final step, and we may not know it's the final step.
One thing you have to admit, though. Capabilities research is just plain exciting, probably on the same level as working on the Manhattan project was exciting. I mean, who doesn't want to know how intelligence works?
I think we are getting some information. For example, we can see that token level attention is actually quite powerful for understanding language and also images. We have some understanding of scaling laws. I think the next step is a deeper understanding of how world modeling fits in with action generation -- how much can you get with just world modeling, versus world modeling plus reward/action combined?
If the transformer architecture is enough to get us there, it tells us a sort of null hypothesis for intelligence -- that the structure for predicting sequences by comparing all pairs of elements of a limited sequence -- is general.
Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?
I think teaching a transformer with an internal thought process (predicting the next tokens over a part of the sequence that's "showing your work") would be an interesting insight into how intelligence might work. I thought of this a little while back but also discovered this is also a long standing MIRI research direction into transparency. I wouldn't be surprised if Google took it up at this point.
I think the desire works because most honest people know, if they give a good-sounding answer that is ultimately meaningless, no benefits will come of the answers given. They may eventually stop asking questions, knowing the answers are always useless. It's a matter of estimating future rewards from building relationships.
Now, when a human gives advice to another human, most of the time it is also useless, but not always. Also, it tends to not be straight up lies. Even in the useless case, people still think there is some utility in there, for example, having the person think of something novel, giving them a chance to vent without appearing to talk to a brick wall, etc.
To teach a GPT to do this, maybe there would have to be some reward signal. To do with purely language modeling, not sure. Maybe you could continue to train it with examples of its own responses and the interviewer's response afterwards with whether its advice was true or not. With enough of these sessions, perhaps you could run the language model and have it try to predict the human response, and see what it thinks of its own answers, haha.
One other thing I'm interested in, is there a good mathematical model of 'search'? There may not be an obvious answer. I just feel like there is some pattern that could be leveraged. I was playing hide and seek with my kids the other day, and noticed that, in a finite space, you expect there to be finite hiding spots. True, but every time you think you've found them all, you end up finding one more. I wonder if figuring out optimizations or discoveries follow a similar pattern. There are some easy ones, then progressively harder ones, but there are far more to be found than one would expect... so to model finding these over time, in a very large room...
I agree, I have also thought I am not completely sure of the dynamics of the intelligence explosion. I would like to have more concrete footing to figure out what takeoff will look like, as neither fast nor slow are proved.
My intuition however is the opposite. I can't disprove a slow takeoff, but to me it seems intuitive that there are some "easy" modifications that should take us far beyond human level. Those intuitions, though they could be wrong, are thus:
- I feel like human capability is limited in some obvious ways. If I had more time and energy to focus on interesting problems, I could accomplish WAY more. Most likely most of us get bored, lazy, distracted, or obligated by our responsibilities too much to unlock our full potential. Also, sometimes our thinking gets cloudy. Reminds me a bit of the movie Limitless. Imagine just being a human, but where all the parts of your brain were a well-oiled machine.
- A single AI would not need to solve so many coordination problems which bog down humanity as a whole from acting like a superintelligence.
- AI can scale its search abilities in an embarrassingly parallel way. It can also optimize different functions for different things, like imagine a brain built for scientific research.
Perhaps intelligence is hard and won't scale much farther than this, but I feel like if you have this, you already have supervillain level intelligence. Maybe not "make us look like ants" intelligence, but enough for domination.
Policy makers.
Policy makers
For ML researchers.
AI existential risk is like climate change. It's easy to come up with short slogans that make it seem ridiculous. Yet, when you dig deeper into each counterargument, you find none of them are very convincing, and the dangers are quite substantial. There's quite a lot of historical evidence for the risk, especially in the impact humans have had on the rest of the world. I strongly encourage further, open-minded study.
It's easy to imagine that the AI will have an off switch, and that we could keep it locked in a box and ask it questions. But just think about it. If some animals were to put you in a box, do you think you would stay in there forever? Or do you think you'd figure a way out that they hadn't thought of?
AI x-risk. It sounds crazy for two reasons. One, because we are used to nothing coming close to human intelligence, and two, because we are used to AI being unintelligent. For the first, the only point of comparison is imagining something that is to us what we are to cats. For the second, though we have not quite succeeded yet, it only takes one. If you have been following the news, we are getting close.
Yeah, I tend to agree. Just wanted to make sure I'm not violating norms. In that case, my specific thoughts are as follows, with a thought to implementing AI transparency at the end.
There is the observation that the transformer architecture doesn't have a hidden state like an LSTM. I thought for a while something like this was needed for intelligence, to have a compact representation of the state one is in. (My biased view, that I've updated away from, was that the weights represented HOW to think, and less about knowledge.) However, it's really intractable to back propagate over so many time steps, and transformers have shown us that you really don't need to. The long term memory is just in the weights.
So, one obvious thing is you could simply make the language model continue to update on its dialogues, including its own response and that of its conversation partner(s). Over time, it could in theory learn from its own conversations and incorporate them into its long term memory, perhaps learning that this type of dialogue involving 'itself' is special. (It can even use special tokens or something if need be.)
As for doing variable length computation, the networks themselves do that with the number of layers they have, but that's really a fixed feature of the architecture. Instead, variable length computation would really be a data issue. Basically, having enough data that "shows your work". The model can learn to output intermediate results and use attention over those to continue its thought process. There is a small potential for baking transparency into the model this way, where you could force the model, through limited context windows, to carry its thoughts out using human language basically. Then if the model is doing a very careful analysis on whether it's ready to take over the world, it would need to go through those calculations, and you could read them.
Of course maybe it'll learn to use code words or something, but it's a thought. Maybe you would always be able to ask it what was it thinking about there, and maybe the explanations it gives might not always be convincing.
Though, imagine that the machine starts outputting gibberish intermediate steps, and tells the human operators that thinking using human zipfian distributed words is highly inefficient, and the operators shrug their shoulders and say "Oh cool. Makes sense. Carry on." As I hear said around here, that's a way to die with less dignity.
I think this is absolutely correct. GPT-3/PaLM is scary impressive, but ultimately relies on predicting missing words, and its actual memory during inference is just the words in its context! What scares me about this is that I think there are some really simple low hanging fruit to modify something like this to be, at least, slightly more like an agent. Then plugging things like this as components into existing agent frameworks, and finally, having entire research programs think about it and experiment on it. Seems like the problem would crack. You never know, but it doesn't look like we're out of ideas any time soon.
This is a question for the community, is there any information hazard in speculating on specific technologies here? It would be totally fun, though seems like it could be dangerous...
My hope was initially that the market wasn't necessarily focused on this direction. Big tech is generally focused on predicting user behavior, which LLMs look to dominate. But then there's autonomous cars, and humanoid robots. No idea what will come of those. Thinking the car angle might be slightly safer, because of the need for transparency and explainability, a lot of the logic outside of perception might be hard coded. Humanoid robots... maybe they will take a long time to catch on, since most people are probably skeptical of them. Maybe factory automation...
As a ML engineer, I think it's plausible. I also think there are some other factors that could act to cushion or mitigate slowdown. First, I think there are more low hanging fruit available. Now that we've seen what large transformer models can do on the text domain, and in a text-to-image Dall-E model, I think the obvious next step is to ingest large quantities of video data. We often talk about the sample inefficiency of modern methods as compared with humans, but I think humans are exposed to a TON of sensory data in building their world model. This seems an obvious next step. Though if hardware really stalls, maybe there won't be enough compute or budget to train a 1T+ parameter multimodal model.
The second mitigating factor I think may be that funding has already been unlocked, to some extent. There is now a lot more money going around for basic research, possibly to the next big thing. The only thing that might stop it is maybe academic momentum into the wrong directions. Though from an x-risk standpoint, maybe that's not a bad thing, heh.
In my mental model, if the large transformer models are already good enough to do what we've shown them to be able to do, it seems possible that the remaining innovations would be more on the side of engineering the right submodules and cost functions. Maybe something along the lines of Yann LeCun's recent keynotes.
I work at a large, not quite FAANG company, so I'll offer my perspective. It's getting there. Generally, the research results are good, but not as good as they sound in summary. Despite the very real and very concerning progress, most papers you take at face value are a bit hyped. The exceptions to some extent are the large language models. However, not everyone has access to these. The open source versions of them are good but not earth shattering. I think they might be if the goal is to general fluent sounding chatbots, but this is not the goal of most work I am aware of. Companies, at least mine, are hesitant on this because they are worried the bot will say something dumb, racist, or just made-up. Most internet applications are more to do with recommendation, ranking, and classification. In these settings large language models are helping, though they often need to be domain adapted. In those cases they are often only helping +1-2% over well trained classical models, e.g. logistic regression. Still a lot revenue-wise though. They are also big and slow and not suited for every application yet, at least not until the infrastructure (training and serving) catches up. A lot of applications are therefore comfortable iterating on smaller end-to-end trained models, though they are gradually adopting features from large models. They will get there, in time. Progress is also slower in big companies, since (a) you can't simply plug in somebody's huggingface model or code and be done with it, (b) there are so many meetings to be had to discuss 'alignment' (not that kind) before anything actually gets done.
For some of your examples:
* procedurally generated music. From what I've listened to, the end-to-end generated music is impressive but not impressive enough that I would listen to it for fun. They seem to have little large scale coherence. However this seems like someone could step in and introduce some inductive bias (for example, verse-bridge-chorus repeating song structure), and actually get something good. Maybe they should stick to to instrumental and have a singer-songwriter riff on it. I just don't think any big name record companies are funding this at the moment, probably they have little institutional AI expertise and think it's a risk, especially to bring on teams of highly paid engineers.
* tools for writers to brainstorm. I think GPT-3 has this as an intended use case? At the moment there are few competitors to make such a large model, so we will see how their pilot users like it.
* photoshop with AI tools. That sounds like it should be a thing. Wonder why Adobe hasn't picked that up (if they haven't? if it's still in development?). Could be an institutional thing.
* Widely available self driving cars. IMO I think real-world agents are still missing some breakthroughs. That's one of the last hurdles I think that will be broken to AGI. It'll happen but I would not be surprised if it is slower than expected.
* Physics simulators. Not sure really. I suspect this might be a case of overhyped research papers. Who knows? I actually used to work on this in grad school, using old fashioned finite difference / multistep / RK methods. Usually relying on taylor series coefficients canceling out nicely, or doing gaussian quadrature. On the one hand I can imagine it hard to beat such precisely defined models, but on the other hand, at the end of the day it's sort of assuming nice properties of functions in a generic way, I can easily imagine a tuned DL stencil doing better for specific domains, e.g. fluids or something. Still, it's hard to imagine it being a slam dunk rather than an iterative improvement.
* Paradigmatically different and better web search. I think we are actually getting there. When I say "hey google", I actually get very real answers to my questions 90% of the time. It's crazy to me. Kids love it. Though I may be in the minority. I always see reddit threads about people saying that google search has gotten worse. I think there's a lot of people who are very used to keyword based searches and are not used to the model trying to anticipate them. This will slow adoption since metrics won't be universally lifted across all users. Also, there's something to be said for the goodness of old fashioned look up tables.
My take on your reasons -- they are mostly spot on.
1. Yes | The research results are actually not all that applicable to products; more research is needed to refine them
2. Yes | They're way too expensive to run to be profitable
3. Yes | Yeah, no, it just takes a really long time to convert innovation into profitable, popular product
4. No, but possibly institutional momentum | Something something regulation?
5. No | The AI companies are deliberately holding back for whatever reason
6. Yes, incrementally | The models are already integrated into the economy and you just don't know it.
Given some of it is institutional slowness, there is room for disruption, which is probably why VC's are throwing money at people. Still though, in many cases a startup is going to have a hard time competing with the compute resources of larger companies.
I posted something I think could be relevant to this: https://www.lesswrong.com/posts/PfbE2nTvRJjtzysLM/instrumental-convergence-to-offer-hope
The takeaway is, for a sufficiently advanced agent, who wants to hedge against the possibility of itself being destroyed by a greater power, may decide the only surviving plan is to allow the lesser life forms some room to optimize their own utility. It's sort of an asymmetrical infinite game theoretic chain. If every agent kills lower agents, only the maximum survives and no one knows if they are the maximum. If there even is a maximum.
War. Poverty. Inequality. Inhumanity. We have been seeing these for millennia caused by nation states or large corporations. But what are these entities, if not greater-than-human-intelligence systems, who happen to be misaligned with human well-being? Now, imagine that kind of optimization, not from a group of humans acting separately, but by an entity with a singular purpose, with an ever diminishing proportion of humans in the loop.
Audience: all, but maybe emphasizing policy makers
Thanks for pointing to ECL, this looks fascinating!
I like to think of it not like trying to show that agent B is not a threat to C. The way it’s set up we can probably assume B has no chance against C. C also may need to worry about agent D, who is concerned about hypothetical agent E, etc. I think that at some level, the decision an agent X makes is the decision all remaining agents in the hierarchy will make.
That said I sort of agree that’s the real fear about this method. It’s kind of like using super-rationality or something else to solve the prisoner’s dilemma. Are you willing to bet your life the other player would still not choose Defect, despite what the new theory says? That said I feel like there’s something there, whether this would work, and if not, would need some kind of clarification from decision theory.