Posts
Comments
What evidence do you have about how much time it takes per day to maintain the effect after the end of the 2 weeks?
The part about "securities with huge variance" is somewhat widely used. See how much EA charities get from crypto and tech startup stock donations.
It's unclear whether the perfectly anti-correlated pair improves this kind of strategy. I guess you're trying to make the strategy more appealing to risk-averse investors? That sounds like it maybe should work, but is hard because risk-averse investors don't want to be early adopters of a new strategy?
Doesn't this depend on what we value?
In particular, you appear to assume that we care about events outside of our lightcone in roughly the way we care about events in our near future. I'm guessing a good deal of skepticism of ECL is a result of people not caring much about distant events.
I had nitrous oxide once at a dentist. It is a dissociative anesthetic. It may have caused something like selective amnesia. I remember that the dentist was drilling, but I have no clear memory of pain associated with it. It's a bit hard to evaluate exactly what it does, but it definitely has some benefits. Maybe the pain seemed too distant from me to be worth my attention?
A much higher fraction of the benefits of prediction markets are public goods.
Most forms of insurance did took a good deal of time and effort before they were widely accepted. It's unclear whether there's a dramatic difference in the rate of adoption of prediction markets compared to insurance.
I'm reaffirming my relatively extensive review of this post.
The simbox idea seems like a valuable guide for safely testing AIs, even if the rest of the post turns out to be wrong.
Here's my too-terse summary of the post's most important (and more controversial) proposal: have the AI grow up in an artificial society, learning self-empowerment and learning to model other agents. Use something like retargeting the search to convert the AI's goals from self-empowerment to empowering other agents.
I'm reaffirming my relatively long review of Drexler's full QNR paper.
Drexler's QNR proposal seems like it would, if implemented, guide AI toward more comprehensible systems. It might modestly speed up capabilities advances, while being somewhat more effective at making alignment easier.
Alas, the full paper is long, and not an easy read. I don't think I've managed to summarize its strengths well enough to persuade many people to read it.
This post didn't feel particularly important when I first read it.
Yet I notice that I've been acting on the post's advice since reading it. E.g. being more optimistic about drug companies that measure a wide variety of biomarkers.
I wasn't consciously doing that because I updated due to the post. I'm unsure to what extent the post changed me via subconscious influence, versus deriving the ideas independently.
Exchanges require more capital to move the price closer to the extremes than to move it closer to 50%.
This post is one of the best available explanations of what has been wrong with the approach used by Eliezer and people associated with him.
I had a pretty favorable recollection of the post from when I first read it. Rereading it convinced me that I still managed to underestimate it.
In my first pass at reviewing posts from 2022, I had some trouble deciding which post best explained shard theory. Now that I've reread this post during my second pass, I've decided this is the most important shard theory post. Not because it explains shard theory best, but because it explains what important implications shard theory has for alignment research.
I keep being tempted to think that the first human-level AGIs will be utility maximizers. This post reminds me that maximization is perilous. So we ought to wait until we've brought greater-than-human wisdom to bear on deciding what to maximize before attempting to implement an entity that maximizes a utility function.
Oops. I misread which questions you were comparing.
Now that I've read the full questions in the actual paper, it looks like some of the difference is due to "within 100 years" versus at any time horizon.
I consider it far-fetched that much of the risk is over 100 years away, but it's logically possible, and Robin Hanson might endorse a similar response.
I don't quite see this logical contradiction that your Twitter poll asks about.
I wouldn't be surprised if the answers reflect framing effects. But the answers seem logically consistent if we assume that some people believe that severe disempowerment is good.
The Fed can stimulate nominal demand at the ZLB. But (outside of times when it's correcting the results of overly tight monetary conditions) that means mostly more inflation, and has strongly diminishing returns on increased real consumption.
Eventually the economy would reach a new equilibrium (which presumably would contain the same amount of private consumption as the old equilibrium).
I expect less consumption in the new equilibrium.
The Fed has limited power to affect real demand. Fed stimulus is only helpful if there's unemployment due to something like deflation.
I realize now that some of this post was influenced by a post that I'd forgotten reading: Causal confusion as an argument against the scaling hypothesis, which does a better job of explaining what I meant by causal modeling being hard.
I agree there's something strange about Loyal's strategy.
But it's not like all aging researchers act like they back Loyal's approach. Intervene Immune has been getting good biomarker results in human trials by taking nearly the opposite approach: raising IGF-1 levels for a while.
I wrote a longer discussion about IGF-1 and aging in my review of Morgan Levine's book True Age.
If someone comes into the hospital
That's a bad criterion to use.
See Robin Hanson's Buy Health proposal for a better option.
Is this the post you're looking for?
I've got a Mercedes with an Active Blind Spot Assist that eliminates the need to worry about this.
I understand how we can avoid trusting an AI if we've got a specification that the proof checker understands.
Where I expect to need an AI is for generating the right specifications.
Note that effectively we are saying to trust the neural network
I expect that we're going to have to rely on some neural networks regardless of how we approach AI. This paper guides us to be more strategic about what reliance to put on which neural networks.
Freitas' paper on ecophagy has a good analysis of these issues.
I initially dismissed Orthogonal due to a guess that their worldview was too similar to MIRI's, and that they would give up or reach a dead end for reasons similar to why MIRI hasn't made much progress.
Then the gears to ascension prodded me to take a closer look.
Now that I've read their more important posts, I'm more confused.
I still think Orthogonal has a pretty low chance of making a difference, but there's enough that's unique about their ideas to be worth pursuing. I've donated $15k to Orthogonal.
See also my review of his book Cool It. He's often right, but is not a reliable source.
Eliminating the profit motive would likely mean that militaries develop dangerous AI a few years later.
I'm guessing that most people's main reason is that it looks easier to ban AI research than to sufficiently reduce the profit motive.
The belief in a universal, independent standard for altruism, morality, and right and wrong is deeply ingrained in societal norms.
That's true of the norms in WEIRD cultures. It is far from universal.
I expect such acausal collaboration to be harder to develop than good calibration, and therefore less likely to happen at the stage I have in mind.
Another report of progress: Mapping the Mind: Worm’s Brain Activity Fully Decoded (full paper).
the people choosing this many white cars seem low-level insane
The increase in white cars seems to follow a 2007 study An Investigation into the Relationship between Vehicle Colour and Crash Risk which says light-colored cars are safer. Maybe it's just a coincidence.
Thank you for narrowing my confusion over what AI_0 does.
My top question now is: how long does AI_0 need to run, and why is it safe from other AIs during that period?
AI_0 appears to need a nontrivial fraction of our future lightcone to produce a decent approximation of the intended output. Yet keeping it boxed seems to leave the world vulnerable to other AIs.
I disagree. The macro environment is good enough that the Fed could easily handle any contraction, provided they focus on forward looking indicators, such as the TIPS spread, or near-realtime indicators such as the ISM purchasing manager numbers.
Now seems like a good time for the Fed to start decreasing interest rates.
On inflation, see Kevin Erdmann (also here).
This is less than half correct.
There's still a widespread labor shortage. A slowdown might mean significant unemployment in Silicon Valley, but it will mean a return to normal in most places.
Inflation is back to normal. It only looks high to people who are focused on lagging indicators such as the CPI.
Most of the problem with the reference ranges is that they are usually just intended to reflect what 95% of the reference population will have. That's much easier to measure than the range which indicates good health.
There isn't much incentive for any authority to establish guidelines for healthy ranges. So too many people end up equating "normal" results with good results, because normal is what gets quantified, and is usually what is reported on test results.
I recommend comparing your ideas to a similar proposal in this post.
I see hints that a fair amount of value might hiding in this post. Here's an attempt at rewriting the parts of this post that I think I understand, with my own opinions shown in {braces}. I likely changed a good deal of the emphasis to reflect my worldview. I presume my comments will reveal some combination of my confused mangling of your ideas, and your cryptic communication style. I erred on the side of rewriting it from scratch to reduce the risk that I copy your text without understanding it. I'm posting a partial version of it in order to get feedback on how well I've understood the first half, before deciding how to tackle the harder parts.
Eliezer imagined a rapid AI takeoff via AI's being more logical (symbolic) than humans, enabling them to better compress evidence about reality, and to search more efficiently through the space of possible AI designs to more rapidly find improvements. {It's hard to pin down Eliezer's claims clearly here, since much of what he apparently got wrong was merely implicit, not well articulated. Why do you call it regression? Did Eliezer expect training data to matter? }
This was expected to produce a more coherent, unified mind than biological analogies suggested. Eliezer imagines that such minds are very sensitive to initial conditions. { I'm unclear whether framing this in chaos theory terms captures Eliezer's intent well. I'd frame it more in terms of first-mover advantage, broadly applied to include the most powerful parts of an AI's goal stomping out other goals within the AI. }
Recent advances in AI suggest that Eliezer overestimated the power of the kind of rigorous, symbolic thinking associated with math (and/or underestimated the power of connectionist approaches?).
Neural nets provide representations of knowledge that are smooth, in the sense that small changes in evidence / input generate small changes in how the resulting knowledge is encoded. E.g. as a small seedling slowly becomes tall enough to be classified as a tree, the neural net alters its representation from "slightly treelike" to "pretty treelike".
In contrast, symbolic approaches to AI have representations with sharp boundaries. This produces benefits in some conspicuous human interactions (i.e. we want to design the rules of chess so there's no room for a concept like "somewhat checkmated").
It wasn't obvious in advance which approach would work better for having an AI write better versions of it's own code. We now have enough evidence to say that the neural net approach can more usefully absorb large amounts of data, while doing a tolerable job of creating sharp boundaries where needed.
One can imagine something involving symbolic AI that embodies knowledge in a form that handles pattern matching so as to provide functionality similar to neural networks. In particular, it would need to encode the symbolic knowledge in a way that improved versions of the symbolic source code are somehow "near" the AI's existing source code. This "nearness" would provide a smoothness that's comparable to what gradient descent exploits.
{Drexler's QNR? Combining symbolic and connectionist AI. Probably not close to what Eliezer had in mind, and doesn't look like it would cause a much faster takeoff than what Deep Learning suggests. QNR leaves much key knowledge "inscrutable matrices", which I gather is incompatible with Eliezer's model.} {I'm guessing you use the term "short programs" to indicate that in what might be Eliezer's model, code remains separate from the knowledge database, and the important intelligence increases can be accomplished via rewriting the code, and leaving the database relatively constant? Unlike neural nets, where intelligence and a database need to be intertwined. } {I have little idea whether you're accurately portraying Eliezer's model here.}
Neural networks work because they are able to represent knowledge so that improved ideas are near existing ideas. That includes source code: when using neural nets to improve source code, that "nearness" enables a smooth, natural search for better source code.
Eliezer freaks out about foom due to the expectation that there's some threshold of intelligence above which a symbolic AI can do something as powerful as gradient descent on its own source code, presumably without the training phases that neural networks need. Existing research does not suggest that's imminent. We're in trouble if it happens before we have good ways to check each step for safety.
Oops, you're right. Section 36.6 does advocate modularity, in a way that hints at the vibe you describe. And my review of the CAIS paper did say things about modularity that seem less likely now than they did 4 years ago.
I agree that people have gotten vibes from the paper which have been somewhat discredited.
Yet I don't see how that vibe followed from what he wrote. He tried to clarify that having systems with specialized goals does not imply they have only narrow knowledge. See section 21 of the CAIS paper ("Broad world knowledge can support safe task performance").
Are people collapsing "AI with narrow goals" and "AI with only specialized knowledge" into one concept "narrow AI"?
Verified safe software means the battle shifts to vulnerabilities in any human who has authority over the system.
What I don’t understand is, either in my model or Critch’s, where we find more hope by declining a pivotal act, once one becomes feasible?
Part of the reason for more hope is that people are more trustworthy if they commit to avoiding the worst forms of unilateralist curses and world conquest. So by having committed to avoiding the pivotal act, leading actors became more likely to cooperate in ways that avoided the need for a pivotal act.
If a single pivotal act becomes possible, then it seems likely that it will also be possible to find friendlier pivotal processes that include persuading most governments to take appropriate actions. An AI that can melt nearly all GPUs will be powerful enough to scare governments into doing lots of things that are currently way outside the Overton window.
Cheap printing was likely a nontrivial factor, but was influenced by much more than just the character sets. Printing presses weren't very reliable or affordable until a bunch of component technologies reached certain levels of sophistication. Even after they became practical, most cultures had limited interest in them.
Filtering out entire sites seems too broad and too crude to have much benefit.
I see plenty of room to turn this into a somewhat good proposal by having GPT-4 look through the dataset for a narrow set of topics. Something close to "how we will test AIs for deception".
A good deal of this post is correct. But the goals of language models are more complex than you admit, and not fully specified by natural language. LLMs do something that's approximately a simulation of a human. Those simulated quasi-humans are likely to have quasi-human goals that are unstated and tricky to observe, for much the same reasons that humans have such goals.
LLMs also have goals that influence what kind of human they simulate. We'll know approximately what those goals are, due to our knowledge of what generated those goals. But how do we tell whether approximately is good enough?
No. I found a claim of good results here. Beyond that I'm relying on vague impressions from very indirect sources, plus fictional evidence such as the movie Latter Days.
Many rationalists do follow something resembling the book's advice.
CFAR started out with too much emphasis on lecturing people, but quickly noticed that wasn't working, and pivoted to more emphasis on listening to people and making them feel comfortable. This is somewhat hard to see if you only know the rationalist movement via its online presence.
Eliezer is far from being the world's best listener, and that likely contributed to some failures in promoting rationality. But he did attract and encourage people who overcame his shortcomings for CFAR's in-person promotion of rationality.
I consider it pretty likely that CFAR's influence has caused OpenAI to act more reasonably than it otherwise would act, due to several OpenAI employees having attended CFAR workshops.
It seems premature to conclude that rationalists have failed, or that OpenAI's existence is bad.
Sorry, it doesn’t look like the conservatives have caught on to this kind of approach yet.
That's not consistent with my experiences interacting with conservatives. (If you're evaluating conservatives via broadcast online messages, I wouldn't expect you to see anything more than tribal signaling).
It may be uncommon for conservatives to use effective approaches at explicitly changing political beliefs. That's partly because politics are less central to conservative lives. You'd likely reach a more nuanced conclusion if you compare how Mormons persuade people to join their religion, which incidentally persuades people to become more conservative.
Does the literature on the economics of reputation have ideas that are helpful?
I haven't thought this out very carefully. I'm imagining a transformer trained both to predict text, and to predict the next frame of video.
Train it on all available videos that show realistic human body language.
Then ask the transformer to rate on a numeric scale how positively or negatively a human would feel in any particular situation.
This does not seem sufficient for a safe result, but implies that LeCun is less nutty than your model of him suggests.
Why assume LeCun would use only supervised learning to create the IC module?
If I were trying to make this model work, I'd use mainly self-supervised learning that's aimed at getting the module to predict what a typical human would feel. (I'd also pray for a highly multipolar scenario if I were making this module immutable when deployed.)
Might this paradigm be tested by measuring LLM fluid intelligence?
I predict that a good test would show that current LLMs have modest amounts of fluid intelligence, and that LLM fluid intelligence will increase in ways that look closer to continuous improvement than to a binary transition from nothing to human-level.
I'm unclear whether it's realistic to get a good enough measure of fluid intelligence to resolve this apparent crux, but I'm eager to pursue any available empirical tests of AI risk.
Upvoted for clarifying a possibly important crux. I still have trouble seeing a coherent theory here.
I can see a binary difference between Turing-complete minds and lesser minds, but only if I focus on the infinite memory and implicitly infinite speed of a genuine Turing machine. But you've made it clear that's not what you mean.
When I try to apply that to actual minds, I see a wide range of abilities at general-purpose modeling of the world.
Some of the differences in what I think of as general intelligence are a function of resources, which implies a fairly continuous scale, not a binary distinction.
Other aspects are a function of accumulated knowledge. That's somewhat lumpier, but still doesn't look close to a binary difference.
Henrich's books The Secret of Our Success and The WEIRDest People in the World suggest that humans have been gradually building up the ability to handle increasingly abstract problems.
Our ancestors of a couple million years ago had language that enabled them to handle a somewhat larger class of mental tasks than other apes.
Tools such as writing, and new concepts such as the Turing machine, enabled them to model ideas that they'd previously failed to find ways to handle.
I see plenty of hints that other mammals have weaker versions of this abstract thought. I'd be surprised if humans have reached the limits of what is possible.
So, when I try to treat general intelligence as a binary, I alternate between doubting that humans have it, and believing that most animals and LLMs have it.
In the hypothetical where there’s no general intelligence, there’s no such thing as “smarter”,
It sure looks like many species of animals can be usefully compared as smarter than others. The same is true of different versions of LLMs. Why shouldn't I conclude that most of those have what you call general intelligence?