Posts
Comments
Eliminating the profit motive would likely mean that militaries develop dangerous AI a few years later.
I'm guessing that most people's main reason is that it looks easier to ban AI research than to sufficiently reduce the profit motive.
The belief in a universal, independent standard for altruism, morality, and right and wrong is deeply ingrained in societal norms.
That's true of the norms in WEIRD cultures. It is far from universal.
I expect such acausal collaboration to be harder to develop than good calibration, and therefore less likely to happen at the stage I have in mind.
Another report of progress: Mapping the Mind: Worm’s Brain Activity Fully Decoded (full paper).
the people choosing this many white cars seem low-level insane
The increase in white cars seems to follow a 2007 study An Investigation into the Relationship between Vehicle Colour and Crash Risk which says light-colored cars are safer. Maybe it's just a coincidence.
Thank you for narrowing my confusion over what AI_0 does.
My top question now is: how long does AI_0 need to run, and why is it safe from other AIs during that period?
AI_0 appears to need a nontrivial fraction of our future lightcone to produce a decent approximation of the intended output. Yet keeping it boxed seems to leave the world vulnerable to other AIs.
I disagree. The macro environment is good enough that the Fed could easily handle any contraction, provided they focus on forward looking indicators, such as the TIPS spread, or near-realtime indicators such as the ISM purchasing manager numbers.
Now seems like a good time for the Fed to start decreasing interest rates.
On inflation, see Kevin Erdmann (also here).
This is less than half correct.
There's still a widespread labor shortage. A slowdown might mean significant unemployment in Silicon Valley, but it will mean a return to normal in most places.
Inflation is back to normal. It only looks high to people who are focused on lagging indicators such as the CPI.
Most of the problem with the reference ranges is that they are usually just intended to reflect what 95% of the reference population will have. That's much easier to measure than the range which indicates good health.
There isn't much incentive for any authority to establish guidelines for healthy ranges. So too many people end up equating "normal" results with good results, because normal is what gets quantified, and is usually what is reported on test results.
I recommend comparing your ideas to a similar proposal in this post.
I see hints that a fair amount of value might hiding in this post. Here's an attempt at rewriting the parts of this post that I think I understand, with my own opinions shown in {braces}. I likely changed a good deal of the emphasis to reflect my worldview. I presume my comments will reveal some combination of my confused mangling of your ideas, and your cryptic communication style. I erred on the side of rewriting it from scratch to reduce the risk that I copy your text without understanding it. I'm posting a partial version of it in order to get feedback on how well I've understood the first half, before deciding how to tackle the harder parts.
Eliezer imagined a rapid AI takeoff via AI's being more logical (symbolic) than humans, enabling them to better compress evidence about reality, and to search more efficiently through the space of possible AI designs to more rapidly find improvements. {It's hard to pin down Eliezer's claims clearly here, since much of what he apparently got wrong was merely implicit, not well articulated. Why do you call it regression? Did Eliezer expect training data to matter? }
This was expected to produce a more coherent, unified mind than biological analogies suggested. Eliezer imagines that such minds are very sensitive to initial conditions. { I'm unclear whether framing this in chaos theory terms captures Eliezer's intent well. I'd frame it more in terms of first-mover advantage, broadly applied to include the most powerful parts of an AI's goal stomping out other goals within the AI. }
Recent advances in AI suggest that Eliezer overestimated the power of the kind of rigorous, symbolic thinking associated with math (and/or underestimated the power of connectionist approaches?).
Neural nets provide representations of knowledge that are smooth, in the sense that small changes in evidence / input generate small changes in how the resulting knowledge is encoded. E.g. as a small seedling slowly becomes tall enough to be classified as a tree, the neural net alters its representation from "slightly treelike" to "pretty treelike".
In contrast, symbolic approaches to AI have representations with sharp boundaries. This produces benefits in some conspicuous human interactions (i.e. we want to design the rules of chess so there's no room for a concept like "somewhat checkmated").
It wasn't obvious in advance which approach would work better for having an AI write better versions of it's own code. We now have enough evidence to say that the neural net approach can more usefully absorb large amounts of data, while doing a tolerable job of creating sharp boundaries where needed.
One can imagine something involving symbolic AI that embodies knowledge in a form that handles pattern matching so as to provide functionality similar to neural networks. In particular, it would need to encode the symbolic knowledge in a way that improved versions of the symbolic source code are somehow "near" the AI's existing source code. This "nearness" would provide a smoothness that's comparable to what gradient descent exploits.
{Drexler's QNR? Combining symbolic and connectionist AI. Probably not close to what Eliezer had in mind, and doesn't look like it would cause a much faster takeoff than what Deep Learning suggests. QNR leaves much key knowledge "inscrutable matrices", which I gather is incompatible with Eliezer's model.} {I'm guessing you use the term "short programs" to indicate that in what might be Eliezer's model, code remains separate from the knowledge database, and the important intelligence increases can be accomplished via rewriting the code, and leaving the database relatively constant? Unlike neural nets, where intelligence and a database need to be intertwined. } {I have little idea whether you're accurately portraying Eliezer's model here.}
Neural networks work because they are able to represent knowledge so that improved ideas are near existing ideas. That includes source code: when using neural nets to improve source code, that "nearness" enables a smooth, natural search for better source code.
Eliezer freaks out about foom due to the expectation that there's some threshold of intelligence above which a symbolic AI can do something as powerful as gradient descent on its own source code, presumably without the training phases that neural networks need. Existing research does not suggest that's imminent. We're in trouble if it happens before we have good ways to check each step for safety.
Oops, you're right. Section 36.6 does advocate modularity, in a way that hints at the vibe you describe. And my review of the CAIS paper did say things about modularity that seem less likely now than they did 4 years ago.
I agree that people have gotten vibes from the paper which have been somewhat discredited.
Yet I don't see how that vibe followed from what he wrote. He tried to clarify that having systems with specialized goals does not imply they have only narrow knowledge. See section 21 of the CAIS paper ("Broad world knowledge can support safe task performance").
Are people collapsing "AI with narrow goals" and "AI with only specialized knowledge" into one concept "narrow AI"?
Verified safe software means the battle shifts to vulnerabilities in any human who has authority over the system.
What I don’t understand is, either in my model or Critch’s, where we find more hope by declining a pivotal act, once one becomes feasible?
Part of the reason for more hope is that people are more trustworthy if they commit to avoiding the worst forms of unilateralist curses and world conquest. So by having committed to avoiding the pivotal act, leading actors became more likely to cooperate in ways that avoided the need for a pivotal act.
If a single pivotal act becomes possible, then it seems likely that it will also be possible to find friendlier pivotal processes that include persuading most governments to take appropriate actions. An AI that can melt nearly all GPUs will be powerful enough to scare governments into doing lots of things that are currently way outside the Overton window.
Cheap printing was likely a nontrivial factor, but was influenced by much more than just the character sets. Printing presses weren't very reliable or affordable until a bunch of component technologies reached certain levels of sophistication. Even after they became practical, most cultures had limited interest in them.
Filtering out entire sites seems too broad and too crude to have much benefit.
I see plenty of room to turn this into a somewhat good proposal by having GPT-4 look through the dataset for a narrow set of topics. Something close to "how we will test AIs for deception".
A good deal of this post is correct. But the goals of language models are more complex than you admit, and not fully specified by natural language. LLMs do something that's approximately a simulation of a human. Those simulated quasi-humans are likely to have quasi-human goals that are unstated and tricky to observe, for much the same reasons that humans have such goals.
LLMs also have goals that influence what kind of human they simulate. We'll know approximately what those goals are, due to our knowledge of what generated those goals. But how do we tell whether approximately is good enough?
No. I found a claim of good results here. Beyond that I'm relying on vague impressions from very indirect sources, plus fictional evidence such as the movie Latter Days.
Many rationalists do follow something resembling the book's advice.
CFAR started out with too much emphasis on lecturing people, but quickly noticed that wasn't working, and pivoted to more emphasis on listening to people and making them feel comfortable. This is somewhat hard to see if you only know the rationalist movement via its online presence.
Eliezer is far from being the world's best listener, and that likely contributed to some failures in promoting rationality. But he did attract and encourage people who overcame his shortcomings for CFAR's in-person promotion of rationality.
I consider it pretty likely that CFAR's influence has caused OpenAI to act more reasonably than it otherwise would act, due to several OpenAI employees having attended CFAR workshops.
It seems premature to conclude that rationalists have failed, or that OpenAI's existence is bad.
Sorry, it doesn’t look like the conservatives have caught on to this kind of approach yet.
That's not consistent with my experiences interacting with conservatives. (If you're evaluating conservatives via broadcast online messages, I wouldn't expect you to see anything more than tribal signaling).
It may be uncommon for conservatives to use effective approaches at explicitly changing political beliefs. That's partly because politics are less central to conservative lives. You'd likely reach a more nuanced conclusion if you compare how Mormons persuade people to join their religion, which incidentally persuades people to become more conservative.
Does the literature on the economics of reputation have ideas that are helpful?
I haven't thought this out very carefully. I'm imagining a transformer trained both to predict text, and to predict the next frame of video.
Train it on all available videos that show realistic human body language.
Then ask the transformer to rate on a numeric scale how positively or negatively a human would feel in any particular situation.
This does not seem sufficient for a safe result, but implies that LeCun is less nutty than your model of him suggests.
Why assume LeCun would use only supervised learning to create the IC module?
If I were trying to make this model work, I'd use mainly self-supervised learning that's aimed at getting the module to predict what a typical human would feel. (I'd also pray for a highly multipolar scenario if I were making this module immutable when deployed.)
Might this paradigm be tested by measuring LLM fluid intelligence?
I predict that a good test would show that current LLMs have modest amounts of fluid intelligence, and that LLM fluid intelligence will increase in ways that look closer to continuous improvement than to a binary transition from nothing to human-level.
I'm unclear whether it's realistic to get a good enough measure of fluid intelligence to resolve this apparent crux, but I'm eager to pursue any available empirical tests of AI risk.
Upvoted for clarifying a possibly important crux. I still have trouble seeing a coherent theory here.
I can see a binary difference between Turing-complete minds and lesser minds, but only if I focus on the infinite memory and implicitly infinite speed of a genuine Turing machine. But you've made it clear that's not what you mean.
When I try to apply that to actual minds, I see a wide range of abilities at general-purpose modeling of the world.
Some of the differences in what I think of as general intelligence are a function of resources, which implies a fairly continuous scale, not a binary distinction.
Other aspects are a function of accumulated knowledge. That's somewhat lumpier, but still doesn't look close to a binary difference.
Henrich's books The Secret of Our Success and The WEIRDest People in the World suggest that humans have been gradually building up the ability to handle increasingly abstract problems.
Our ancestors of a couple million years ago had language that enabled them to handle a somewhat larger class of mental tasks than other apes.
Tools such as writing, and new concepts such as the Turing machine, enabled them to model ideas that they'd previously failed to find ways to handle.
I see plenty of hints that other mammals have weaker versions of this abstract thought. I'd be surprised if humans have reached the limits of what is possible.
So, when I try to treat general intelligence as a binary, I alternate between doubting that humans have it, and believing that most animals and LLMs have it.
In the hypothetical where there’s no general intelligence, there’s no such thing as “smarter”,
It sure looks like many species of animals can be usefully compared as smarter than others. The same is true of different versions of LLMs. Why shouldn't I conclude that most of those have what you call general intelligence?
If a hostile alien civilization notices us, we’re going to die. But if we’re going to die from the AGI anyway, who cares?
Anyone with a p(doom from AGI) < 99% should conclude that harm from this outweighs the likely benefits.
I’m guessing something like a 0.1% success rate. I think this is sufficient for success if you have automated the process and can afford to run the process enough to generate and test millions of possibilities. This is a largely parallelizable process, so it doesn’t necessarily take much wall clock time.
How much compute would it take to test a million of these in parallel? I assume you're imagining something less compute-intensive than retraining a million GPTs from scratch, but I'm unclear how much less compute-intensive.
How much evidence does it need per instance to figure out whether the change is an improvement? With a 0.1% success rate, it doesn't take much imperfection in the evaluations for most apparent improvements to be false positives.
If months of debate with superforecasters didn’t accomplish much, that’s really disheartening
I participated in Tetlock's tournament. Most people devoted a couple of hours to this particular topic, spread out over months.
A significant fraction of the disagreement was about whether AI would be transformative this century. I made a bit of progress on this, but didn't get enough feedback to do much. AFAICT, many superforecasters know that reports of AI progress were mostly hype in prior decades, and are assuming that is continuing unless they see strong evidence to the contrary. They're typically not willing to spend much more than an hour looking for such evidence.
This post seems about 90% correct, and written better than your previous posts.
I expect nanotech will be more important, someday, than you admit. But I agree that it's unlikely to be relevant to foom. GPUs are close enough to nanotech that speeding up GPU production is likely more practical than switching to nanotech.
I suspect Eliezer believes AI could speed up GPU production dramatically without nanotech. Can someone who believes that explain why they think recent GPU progress has been far from optimal?
Just preceding the 6 OOM claim, EY provides a different naive technical argument as to why he is confident that it is possible to create a mind more powerful than the human brain using much less compute:
I don't see anything naive about the argument that you quoted here (which doesn't say how much less compute). Long, fast chains of serial computation enable some algorithms that are hard to implement on brains. So it seems obvious that such systems will have some better-than-human abilities.
Eliezer doesn't seem naive there until he jumps to implying a 6 OOM advantage on tasks that matter. He would be correct here if there are serial algorithms that improve a lot on the algorithms that matter most for human intelligence. It's not too hard to imagine that evolution overlooked such serial algorithms.
Recent patterns in computing are decent evidence that key human pattern-recognition algorithms can't be made much more efficient. That seems to justify maybe 80% confidence that Eliezer is wrong here. My best guess is that Eliezer focuses too much on algorithms where humans are weak.
I'm "relativizing" morality in the sense that Henrich does in The Secret of Our Success and The WEIRDest People in the World: it's mostly a package of heuristics that is fairly well adapted to particular conditions. Humans are not wise enough to justify much confidence in beliefs about which particular heuristics ought to be universalized.
To the extent that a utility function is useful for describing human values, I agree that it is not up for grabs. I'm observing that "satisfy preferences" is closer to a good summary of human utility functions than are particular rules about marriage or about Al Qaeda.
David Friedman's book Law's Order is, in part, an extended argument for that position:
One objection to the economic approach to understanding the logic of law is that law may have no logic to understand. Another and very different objection is that law has a logic but that it is, or at least ought to be, concerned not with economic efficiency but with justice. ... My second answer is that in many, although probably not all, cases it turns out that the rules we thought we supported because they were just are in fact efficient. To make that clearer I have chosen to ignore entirely issues of justice going into the analysis. In measuring the degree to which legal rules succeed in giving everyone what he wants, and judging them accordingly, I treat on an exactly equal plane my desire to keep my property and a thief’s desire to take it. Despite that, as you will see, quite a lot of what looks like justice—for example, laws against theft and the requirement that people who make messes should clean them up—comes out the other end. That, I think, is interesting.
Agreed.
Davidad seems to be aiming for what I'd call infeasible rigor, presumably in hope of getting something that would make me more than 95% confident of success.
I expect we could get to 80% confidence with this basic approach, by weakening the expected precision of the world model, and evaluating the AI on a variety of simulated worlds, to demonstrate that the AI's alignment is not too sensitive to the choice of worlds. Something along the lines of the simulations in Jake Cannell's LOVE in a simbox.
Is 80% confidence the best we can achieve? I don't know.
I can't point to anything concrete from Drexler, beyond him being much more cautious than Eliezer about predicting the speed of engineering projects.
Speaking more for myself than for Drexler, it seems unlikely that AI would speed up nanotech development more than 10x. Engineering new arrangements of matter normally has many steps that don't get sped up by more intelligence.
The initial nanotech systems we could realistically build with current technology are likely dependent on unusually pure feedstocks, and still likely to break down frequently. So I expect multiple generations of design before nanotech becomes general-purpose enough to matter.
I expect that developing nanotech via human research would require something like $1 billion in thoughtfully spent resources. Significant fractions of that would involve experiments that would be done serially. Sometimes that's because noise makes interactions hard to predict. Sometimes it's due to an experiment needing a product from a prior experiment.
Observing whether an experiment worked is slow, because the tools for nanoscale images are extremely sensitive to vibration. Headaches like this seem likely to add up.
I'm puzzled that this post is being upvoted. The author does not sound familiar with Drexler's arguments in NanoSystems.
I don't think we should worry much about how nanotech might affect an AI's abilities, but this post does not seem helpful.
I object to the implication that Eliezer and Drexler have similar positions. Eliezer seems to seriously underestimate how hard nanotech is. Drexler has been pretty cautious about predicting how much research it would require.
I see one of the big advantages of a pause and training run limit (of any size and duration) being that it is a relatively blunt instrument that is relatively easy to evaluate.
It's far from obvious whether such a limit would slow capability growth much.
One plausible scenario is that it would mainly cause systems to be developed in a more modular way. That might make us a bit safer by pushing development more toward what Drexler recommends. Or it might fool most people into thinking there's a pause, while capabilities grow at 95% of the pace they would otherwise have grown at.
I have some slight hopes that this will turn out to play an important role in making AI safe for us. There's nothing obviously impossible about it.
I'll still try to do a lot of my analysis from a security mindset that assumes this won't work. But I expect I see more possibilities when I alternate between hope and fear than when I only use a fearful mindset.
See section 19 of Drexler's CAIS paper (Reframing Superintelligence) for some key limits on what kinds of AI would develop instrumental convergence .
The doomsday argument says there likely won't be trillions of humans like us. It says less about whether that means extinction, or trillions of people who live unrecognizably better lives.
Upvoted for creative thinking. I'm having trouble deciding whether it's a good idea.
My top picks:
- The Evolution of Cooperation, by Axelrod
- The WEIRDest People in the World, by Joseph Henrich
Some weaker endorsements:
- Good and Real, by Gary Drescher
- Reasons and Persons, by Parfit
- Kanzi, by Sue Savage-Rumbaugh
- Nonzero, by Robert Wright
- Trust, by Fukuyama
- Simple Rules for a Complex World, by Richard A. Epstein
- The Elephant in the Brain, by Kevin Simler and Robin Hanson
DeepMind might be more cautious about what it releases, and/or developing systems whose power is less legible than GPT. I have no real evidence here, just vague intuitions.
I expect a speculative bubble in AI stocks sometime this decade. But that bubble won't be important until ordinary investors are talking a lot about their AI investments. We don't seem close to that stage yet.
The main effect might be reduced interpretability due to more superpositioning?
I find it odd that exactly one option says "Not in principle mutex with all other answers.". I expect several of the higher-ranked options will contribute to success.
I signed up with Alcor 26 years ago. I'm fairly sure they had a free-form field for preferences about when and how I'd want to be brought back.
I think the Alcor employee that I talked to about the form had some odd conditions on when he should be brought back, something like not until the fidelity of revival reached some theoretical perfection.
I've found a better way to bet on real rates increasing sometime in the next decade: shorting long-dated Eurodollar futures, while being long 2025 Eurodollar futures (to bet on a nearer-term decline in expected inflation).
The Eurodollar markets are saying that interest rates will start slowly rising after about 2027, implying a mild acceleration of economic growth and/or inflation.
I spent nearly 2 months trying to short the December 2032 contract, slowly reducing my ask price. As far as I can tell, nobody is paying attention to that contract. At very least, nobody who thinks interest rates will be stable after 2030.
Recent events prompted me to look for other Eurodollar contract months. I ended up trading the September 2029 contracts, which have enough liquidity that I was able to trade a decent amount within a few days.
That gives me a fairly pure bet on short-term interest (nominal) rates rising between 2025 and 2029.
Some top scientists are crazy enough that it would be disastrous to give them absolute power.
I mostly agree with Holden, but think he's aiming to use AIs with more CIS than is needed or safe.
If only one AI passes this threshold and it works to end humanity either directly or indirectly, humanity has zero chance of survival.
No, zero is not a probability.
Eliezer thinks your strategy won't work because AIs will collude. I think that's not too likely at critical stages.
I can imagine that having multiple AIs of unclear alignment is bad because race dynamics cause them to do something reckless.
But my best guess is that having multiple AIs is good under the most likely scenarios.