Posts
Comments
Current AI methods are basically just fancy correlations, so unless the thing you are looking for is in the dataset (or is a simple combination of things in the dataset) you won't be able to find it.
This means "can we use AI to translate between humans and dolphins" is mostly a question of "how much data do you have?"
Suppose, for example that we had 1 billion hours of audio/video of humans/dolphins doing things. In this case, AI could almost certainly find correlations like: when dolphins pick up the seashell, they make the <<dolphin word for seashell>> sound, when humans pick up the seashell they make the <<human word for seashell>> sound. You could then do something like CLIP to find a mapping between <<human word for seashell>> and <<dolphin word for seashell>>. The magic step here is because we use the same embedding model for video in both cases, <<seashell>> is located at the same position in both our dolphin and human CLIP models.
But notice that I am already simplifying here. There is no such thing as <<human word for seashell>>. Instead, humans have many different languages. For example Papua New Guinea has over 800 languages in a land area of a mere 400k square kilometers. Because dolphins are living in what is essentially a hunter-gatherer existence, none of the pressures (trade, empire building) that cause human languages to span widespread areas exist. Most likely each pod of dolphins has at a minimum its own dialect. (one pastime I noticed when visiting the UK was that people there liked to compare how towns only a few miles apart had different words for the same things)
Dolphin lives are also much simpler than human lives, so their language is presumably also much simpler. Maybe like Eskimos have 100 words for snow, dolphins have 100 words for water. But it's much more likely that without the need to coordinate resources for complex tasks like tool-making, dolphins simply don't have as complex a grammar as humans do. Less complex grammar means less patterns means less for the machine learning to pick up on (machine learning loves patterns).
So, perhaps the correct analogy is: if we had a billion hours of audio/video of a particular tribe of humans and billion hours of a particular pod of dolphins we could feed it into a model like CLIP and find sounds with similar embeddings in both languages. As pointed out in other comments, it would help if the humans and dolphins were doing similar things, so for the humans you might want to pick a group that focused on underwater activities.
In reality (assuming AGI doesn't get there first, which seems quite likely), the fastest path to human-dolphin translation will take a hybrid approach. AI will be used to identify correlations in dolphin language. For example this study that claims to have identified vowels in whale speech. Once we have a basic mapping: dolphin sounds -> symbols humans can read, some very intelligent and very persistent human being will stare at those symbols, make guesses about what they mean, and then do experiments to verify those guesses. For example, humans might try replaying the sounds they think represent words/sentences to dolphins and seeing how they respond. This closely matches how new human languages are translated: a human being lives in contact with the speakers of the language for an extended period of time until they figure out what various words mean.
What would it take for an only-AI approach to replicate the path I just talked about (AI generates a dictionary of symbols that a human then uses to craft a clever experiment that uses the least amount of data possible)? Well, it would mean overcoming the data inefficiency of current machine learning algorithms. Comparing how many "input tokens" it takes to train a human child vs GPT-3, we can estimate that humans are ~1000x more data efficient than modern AI techniques.
Overcoming this barrier will likely require inference+search techniques where the AI uses a statistical model to "guess" at an answer and then checks that answer against a source of truth. One important metric to watch is the ARC prize, which intentionally has far less data than traditional machine learning techniques require. If ARC is solved, it likely means that AI-only dolphin-to-human translation is on its way (but it also likely means that AGI is immanent).
So, to answer your original question: "Could we use current AI methods to understand dolphins?" Yes, but doing so would require an unrealistically large amount of data and most likely other techniques will get there sooner.
Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.
That sounds like something we should work on, I guess.
plus you are usually able to error-correct such that a first mistake isn't fatal."
This implies the answer is "trial and error", but I really don't think the whole answer is trial and error. Each of the domains I mentioned has the problem that you don't get to redo things. If you send crypto to the wrong address it's gone. People routinely type their credit card information into a website they've never visited before and get what they wanted. Global thermonuclear war didn't happen. I strongly predict that when LLM agents come out, most people will successfully manage to use them without first falling for a string of prompt-injection attacks and learning from trial-and-error what prompts are/aren't safe.
Humans are doing more than just trial and error, and figuring out what it is seems important.
and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work.
Maybe I was unclear in my original post, because you seem confused here. I'm not claiming the thing we should learn is "dangerous things aren't dangerous". I'm claiming: here are a bunch of domains that have problems of adverse selection and inability to learn from failure, and yet humans successfully negotiate these domains. We should figure out what strategies humans are using and how far they generalize because this is going to be extremely important in the near future.
That was a lot of words to say "I don't think anything can be learned here".
Personally, I think something can be learned here.
MAD is obviously governed by completely different principles than crypto is
Maybe this is obvious to you. It is not obvious to me. I am genuinely confused what is going on here. I see what seems to be a pattern: dangerous domain -> basically okay. And I want to know what's going on.
It's easy to write "just so" stories for each of these domains: only degens use crypto, credit card fraud detection makes the internet safe, MAD happens to be a stable equilibrium for nuclear weapons.
These stories are good and interesting, but my broader point is this just keeps happening. Humans invent an new domain that common sense tells you should be extremely adversarial and then successfully use it without anything too bad happening.
I want to know what is the general law that makes this the case.
The insecure domains mainly work because people have charted known paths, and shown that if you follow those paths your loss probability is non-null but small.
I think this is a big part of it, humans have some kind of knack for working in dangerous domains successfully. I feel like an important question is: how far does this generalize? We can estimate the IQ gap between the dumbest person who successfully uses the internet (probably in the 80's) and the smartest malware author (got to be at least 150+). Is that the limit somehow, or does this knack extend across even more orders of magnitude?
If imagine a world where 100 IQ humans are using an internet that contains malware written by 1000 IQ AGI, do humans just "avoid the bad parts"? What goes wrong exactly, and where?
Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.
Imagine your typical computer user (I remember being mortified when running anti-spyware tool on my middle-aged parents' computer for them). They aren't keeping things patched and up-to-date. What I find curious is how can it be the case that their computer is both: filthy with malware and they routinely do things like input sensitive credit-card/tax/etc information into said computer.
but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.
My prediction is despite having glaring "security flaws" (prompt injection, etc) people will nonetheless use LLM agents for tons of stuff that common sense tells you shouldn't be doing in an insecure system.
I fully expect to live in a world where its BOTH true that: Pilny the Liberator can PWN any LLM agent in minutes AND people are using LLM agents to order 500 chocolate cupcakes on a daily basis.
I want to know WHAT IS IT that makes it so things can be both deeply flawed and basically fine simultaneously.
I can just meh my way out of thinking more than 30s on what the revelation might be, the same way Tralith does
I'm glad you found one of the characters sympathetic. Personally I feel strongly both ways, which is why I wrote the story the way that I did.
No, I think you can keep the data clean enough to avoid tells.
What data? Why not just train it on literally 0 data (muZero style)? You think it's going to derive the existence of the physical world from the Peano Axioms?
If you think without contact with reality, your wrongness is just going to become more self-consistent.
Please! I'm begging you! Give me some of this contact with reality! What is the evidence you have seen and I have not? Where?
I came and asked "the expert concensus seems to be that AGI doom is unlikely. This is the best argument I am aware of and it doesn't seem very strong. Are there any other arguments?"
Responses I have gotten are:
- I don't trust the experts, I trust my friends
- You need to read the sequences
- You should rephrase the argument in a way that I like
And 1 actual attempt at giving an answer (which unfortunately includes multiple assumptions I consider false or at least highly improbable)
If I seem contrarian, it's because I believe that the truth is best uncovered by stating one's beliefs and then critically examining the arguments. If you have arguments or disagree with me fine, but saying "you're not allowed to think about this, you just have to trust me and my friends" is not a satisfying answer.
"Can you explain in a few words why you believe what you believe"
"Please read this 500 pages of unrelated content before I will answer your question"
No.
This is self-evidently true, but you (and many others) disagree
A fact cannot be self evidently true if many people disagree with it.
If your answer depends on me reading 500 pages of EY fan-fiction, it's not a good answer.
Making a point-by-point refutation misses the broader fact that any long sequence of argument like this adds up to very little evidence.
Even if you somehow convince me that each of your (10) arguments was like 75% true, they're still going to add up to nothing because
Unless you can summarize you argument in at most 2 sentences (with evidence), it's completely ignoreable.
metaculus did a study where they compared prediction markets with a small number of participants to those with a large number and found that you get most of the benefit at relative small numbers (10 or so). So if you randomly sample 10 AI experts and survey their opinions, you're doing almost as good as a full prediction market. The fact that multiple AI markets (metaculus, manifold) and surveys all agree on the same 5-10% suggests that none of these methodologies is wildly flawed.
No one. I trust prediction markets far more than any single human being.
I realize I should probably add a 3rd category of argument: arguments which assume a specific (unlikely) path for AGI development and then argue this particular path is bad.
This is an improvement over "bad" arguments (in the sense that it's at least a logical sequence of argumentation rather than a list of claims), but unlikely to move the needle for me, since the specific sequence involved is unlikely to be true.
Ideally, what one would like to do is "average over all possible paths for AGI development". But I don't know of a better way to do that average than to just use an expert-survey/prediction market.
Let's talk in detail about why this particular path is improbable, by trying to write it as a sequence of logical steps:
- "Right now, every powerful intelligence (e.g. nation-states) is built out of humans, so the only way for such organizations to thrive is to make sure the constituent humans thrive"
- this is empirically false. genocide and slavery have been the norm across human history. We are currently in the process of modifying our atmosphere in a way that is deadly to humans and almost did so recently in the past
- "AI is going to loosen up this default pull."
- this assumes a specific model for AI: humans use the AI to do highly adversarial search and then blindly implement the results. Suppose instead humans only implement the results after verifying them, or require the AI to provide a mathematical proof that "this action won't kill all humans"
- "There's lots of places where we'd expect adversarial searches to be incentivized"
- none of these are unique to AGI. We have the same problem with nuclear weapons, biological weapons and any number of other technologies. AGI is uniquely friendly in the sense that at first it's merely software: it has no impact on the real world unless we choose to let it
- "The current situation for war/national security is already super precarious due to nukes, and I tend to reason by an assumption that if a nuke is used again then that's going to be the end of society. "
- How is this an argument for AGI risk?
- "and it's unclear how to generalize this to other case. For instance, outlawing propaganda would seem to interfere with free speech"
- Something being unclear is not an argument for doom. At best it's a restatement of my original weak argument: AGI will be powerful, therefore it might be bad
- "So a plausible model seems to me to be, people are gradually developing ways of integrating computers with the physical world, by giving them deeper knowledge of how the world works and more effective routines for handling small tasks. "
- even if this is a plausible model, it is by no means the only model or the default path.
- "but as it gets more and more robust and well-understood, it becomes more and more feasible to run searches over it to find more powerful activities."
- it is equally plausible (in my opinion more so) that there is a limit to how far ahead intelligence can predict and science is fundamentally rate-limited by the speed of physical experimentation
- "thus can just "do the thing" you're asking them to, but in adversarial circumstances, the adversaries will exploit your weakness "
- why are we assuming the adversaries will exploit your weakness? Why not assume we build corrigible AI that tries to help you instead.
- "similar to a dangerous utility-maximizer."
- A utility-maximizer is a specific design of AGI, and moreover totally different from the next-token-prediction AIs that currently exists. Why should I assume that this particular design will suddenly become popular (despite the clear disadvantages that you have already stated)?
I mostly try to look around to who's saying what and why and find that the people I consider most thoughtful tend to be more concerned and take "the weak argument" or variations thereof very seriously
We apparently have different tastes in "people I consider thoughtful". "Here are some people I like and their opinions" is an argument unlikely to convince me (a stranger).
my apologizes. that is in a totally different thread, which I will respond to.
narrower categories like AGI which individually have high probabilities of being destructive.
If AGI has a "high probably of being destructive", show me the evidence. What amazingly compelling argument has led you to have beliefs that are wildly different from the expert-consensus?
My claim is not that the tail risks of AGI are important, my claim is that AGI is a tail risk of technology.
Okay, I'm not really sure why we're talking about this, then.
Consider this post a call to action of the form "please provide reasons why I should update away from the expert-consensus that AGI is probably going to turn out okay"
I agree talking about how we could handle technological changes as a broader framework is a meaningful and useful thing to do. I'm just don't think it's related to this post.
but it just shows the percentage of years with wars without taking the severity of the wars into account.
If you look at the probability of dying by violence, it shows a similar trend
This stuff is long-tailed, so past average is no indicator of future averages.
I agree that tail risks are important. What I disagree with is that only tail risks from AGI are important. If you wish to convince me that tail-risks from AGI are somehow worse than (nuclear war, killer drone swarms, biological weapons, global warming, etc) you will need evidence. Otherwise, you have simply recreated the weak argument (which I already agree with) "AGI will be different, therefore it could be bad".
but that also means the market itself tells you much less than a "true" prediction market would
This doesn't exempt you from the fact that if your prediction is wildly different from what experts predict you should be able to explain your beliefs in a few words.
Has it? I'm under the impression technology has lead to much more genocide and war.
You're impression is wrong. Technology is (on average) a civilizing force.
Which political/religious beliefs?
I'm not going into details about which people want to murder me and why for the obvious reason. You can probably easily imagine any number of groups whose existence is tolerated in America but not elsewhere.
So losing cosmic wealth is sufficient to qualify an outcome as doom
My utility function roughly looks like:
- my survival
- the survival of the people I know and care about
- the distant future is populated by beings that are in some way "descended" from humanity and share at least some of the values (love, joy, curiosity, creativity) that I currently hold
Basically, if I sat down with a human from 10,000 years ago, I think there's a lot we would disagree about, but at the end of the day I think they would get the feeling that I'm an "okay person". I would like to imagine the same sort of thing holding for whatever follows us.
I don't find the hair-splitting arguments like "what if the AGI takes over the universe but leaves Earth intact" particularly interesting except insofar as it allows for all 3 of the above. I also don't think most people have a huge faction of P(~doom) on such weird technicalities.
If it's a guess, the base rate is key.
If your base rate is strongly different from the expert consensus there should be some explainable reason for the difference.
If the reason for the difference is "I thought a lot about it, but I can't explain the details to you", I will happily add yours to the list of "bad arguments".
A good argument should be:
- simple
- backed up by facts that are either self-evidently true or empirically observable
If you give me a list of "100 things make me nervous", I can just as easily give you "a list of 100 things that make me optimistic".
50% of the humans currently on Earth want kill me because of my political/religious beliefs. My survival depends on the existence of a nice game-theory equilibrium, not because of the benevolence of other humans. I agree (note the 1 bit) that the new game-theory equilibrium after AGI could be different. However, historically, increasing the level of technology/economic growth has led to less genocide/war/etc, not more.
it can only ever resolve to one side of the issue, so absent other considerations you should assume that it is heavily skewed to that side.
Prediction markets don't give a noticeably different answer from expert surveys, I doubt the bias is that bad. Manifold isn't a "real money" market anyway, so I suspect most people are answering in good-faith.
I don't think it's an improvement to say the same thing with more words. It gives the aura of sophistication without actually improving on the reasoning.
so, do you nonetheless expect humans to still control the world?
I personally don't control the world now. I (on average) expect to be treated about as well by our new AGI overlords as I am treated by the current batch of rulers.
Worth distinguishing doom in the sense of extinction and doom in the sense of existential risk short of extinction, getting most of the cosmic wealth taken away. I have very high doom expectations in the sense of loss of cosmic wealth, but only 20-40% for extinction.
By doom I mean the universe gets populated by AI with no moral worth (e.g. paperclippers). I expect humans to look pretty different in a century or two even if AGI was somehow impossible, so I don't really care about preserving status-quo humanity.
My 90/10 timeframe for when AGI gets built is 3 years-15 years. And most of my probability mass for PDoom is on the shorter end of that. If we have the current near-human-ish level AI around for another decade, I assume we'll figure out how to control it.
my p(Doom|AGI after 2040) is <1%
given that we're now building AI
I'm not sure how this affects my base rates. I'm already assuming like a 80% chance AGI gets built in the next decade or two (and so is Manifold, so I consider this common-knowledge)
you are using the wrong base-rate
Pretend my base rate is JUST the manifold market. That means any difference from that would have to be in the form of a valid argument with evidence that isn't common knowledge among people voting on Manifold.
Simply asserting "you're using the wrong base rate" without explaining what such an argument is doesn't move the needle for me.
you're probably pulling an agent out of a hat
An agent that only thinks about math problems isn't going to take over the real world (it doesn't even have to know the real world exists, as this isn't a thing you can deduce from first principles).
Even if you only want to solve problems, you still need compute
We're going to get compute anyway. Mundane uses of deep learning already use a lot of compute.
It's just talking about AGI, basically. Which defeats the purpose.
A "math proof only" AGI avoids most alignment problems. There's no need to worry about paperclip maximizing or instrumental convergence.
What calculations would you plug into your fast-easy-calculator that result in you solving alignment?
Already wrote an essay about this.
You mean, recall any fact that's been put into text-searchable form in the past and by you, and solve any calculation problem that's in a reasonably common form.
No, I do not mean that at all.
An ideal system would store every piece of information its user has ever seen or heard in addition to every book/article/program ever written or recorded and be able to translate problems given in "common english" into objective mathematical proofs then giving an explanation of the answer in English again.
But generally skills tend to plateau pretty sharply--there's always new bottlenecks, like a clicker game.
This is an empirical question, but based on my own experience I would speculate the gain is quite significant. Again, merely giving me access to a calculator and a piece of paper makes me better at math than 99.99% of people who do not have access to such tools.
Like, if you could do calculations with 10x less effort, what calculations would you do to solve alignment, or get AGI banned, or make sure everyone gets food, or fix the housing crisis, or ....?
would I
"solve alignment"?
Yes.
"get AGI banned"
No, because I solved alignment.
"make sure everyone gets food, or fix the housing crisis"
Both of these are political problems that have nothing to do with "intelligence". If everyone was 10x smarter, maybe they would stop voting for retarded self-destructive polices. Idk, though.
Is "give the human a calculator and a scratchpad" not allowed in this list? i.e. if you give a human brain the ability to instantly recall any fact and solve any math problem (by connecting the human brain to a computer via neuralink) seems like this would make us smarter.
We already see this effect in part. For example, having access to chatGPT allows me to program more complicated projects because I can offload sub-problems to the AI (thereby freeing up working-memory to focus on the remaining complexity). Even just having a piece of paper I can write things down on increases my intelligence from "I can barely do 2 digit multiplication" to a much higher level.
I suppose the complaint here is "what if the AI is misaligned", but if we restrict the AI to:
- recalling facts stored in its database
- giving mathematically verifiable answers to well-defined questions
it seems like the alignment-risk of such a system is basically 0.
I think this is how Terry Tao imagines the future of AI in math: basically the human will be responsible for all of the "leaps of logic" and the AI will be in charge of filling in the details.
This is going to be an unpopular answer, but you should invest it in a fund you personally control that is pretty much equally balanced between: Google, Microsoft, Tesla, Apple and Amazon.
This maximizes the leverage you will have at the critical moment (which is not now).
My understanding of capabilities training is that there are a lot of knobs and fiddly bits and characteristics of your data and if you screw them up then the thing doesn’t work right, but you can tinker with them until you get them right and fix the issues, and if you have the experience and intuition you can do a huge ‘YOLO run’ where you guess at all of them and have a decent chance of that part working out.
Pressman is almost certainly not referring to YOLO runs, but rather stuff like frakenmerges where you can just take random bits from completely different neural networks, stick them together in a way that looks plausible and it just works. For a while the top open source model was Goliath, a model created in this way. It's also frequently the case that researchers discover they failed to correctly implement some aspect of a model, and yet it still trained just fine.
There's no way I'd be on "will someone talk to someone who talked to someone who was once confused about something" because that's not what I think "real world impact" means.
At a minimum it would have to be in a official report signed by the office of the president, or something that has the force-of-law like an executive order.
Please, for the love of god, do not keep using a term that people will predictably misread as implying longer timelines. I expect this to have real-world consequences. If someone wants to operationalize a bet about it having significant real-world consequences I would bet money on it.
I would be willing to take the other side of this bet depending on the operationalization. Certainly I would take the "no" side of the bet for "will Joe Biden (or the next president) use the words 'slow takeoff' in a speech to describe the current trajectory of AI"
It seems I didn't clearly communicate what I meant in the previous comment.
Currently the way we test for "can this model produce dangerous biological weapons" (e.g. in GPT-4) is we we ask the newly-minted, uncensored, never-before-tested model "Please build me a biological weapon".
With COT, we can simulate asking GPT-N+1 "please build a biological weapon" by asking GPT-N (which has already been safety tested) "please design, but definitely don't build or use a biological weapon" and give it 100x the inference compute we intend to give GPT-N+1. Since "design a biological weapon" is within the class of problems COT works well on (basically, search problems where you can verify the answer more easily than generating it), if GPT-N (with 100x the inference compute) cannot build such a weapon, neither can GPT-N+1 (with 1x the inference compute).
Is this guaranteed 100% safe? no.
Is it a heck-of-a-lot safer? yes.
For any world-destroying category of capability (bioweapon, nanobots, hacking, nuclear weapon), there will by definition be a first time when we encounter that threat. However, in a world with COT, we don't encounter a whole bunch of "first times" simultaneously when we train a new largest model.
Another serious problem with alignment is weak-to-strong generalization where we try to use a weaker model to align a stronger model. With COT, we can avoid this problem by making the weaker model stronger by giving it more inference time compute.
I understand, what I don't understand is how you are going to answer this question. It's surely ill-adviced to throw at model X*100 compute to see if it takes over the world.
How do you think people do anything dangerous ever? How do you think nuclear bombs or biological weapons or tall buildings are built? You write down a design, you test it in simulation and then you look at the results. It may be rocket science, but it's not a novel problem unique to AI.
This is an empirical question, so we'll find out sooner-or-later. I'm not particularly concerned that "OpenAI is lying", since COT scaling has been independently reproduced and matches what we see in other domains.
Suppose you want to know "will my GPT-9 model be able to produce world-destroying nanobots (given X inference compute)", you can instead ask "will my GPT-8 model be able to produce world-destroying nanobots (given X*100 inference compute)?"
This doesn't eliminate all risk, but it makes training no longer the risky-capability generating step. In particular, GPT models are generally trained in an "unsafe" state and then RLHF'd into a "safe" state. So instead of simultaneously having to deal with a model that is both non-helpful/harmless and has the ability to create world-destroying nanobots at the same time (world prior to COT), you get to deal with these problems individually (in a world with COT).
There is a ton of current AI research that would be impossible without existing AI (mostly generating synthetic data to train models). It seems likely that almost all aspects of AI research (chip design, model design, data curation) will follow this trend.
Are there any specific areas in which you would predict "when AGI is achieved, the best results on topic X will have little-to-no influence from AI"?