Posts
Comments
Answer: it was not given the solution. https://x.com/wtgowers/status/1816839783034843630?s=46&t=UlLg1ou4o7odVYEppVUWoQ
Anyone have a good intuition for why Combinatorics is harder than Algebra, and/or why Algebra is harder than Geometry? (For AIs). Why is it different than for humans?
It’s funny to me that the one part of the problem the AI cannot solve is translating the problem statements to Lean. I guess it’s the only part that the computer has no way to check.
Does anyone know if “translating the problem statements” includes the providing the solution (eg “an even integer” for P1), and the AI just needs to prove the solution correct? Its not clear to me what’s human-written and what’s AI-written, and the solution is part of the “theorem” part which I’d guess is human-written.
For row V, why is SS highlighted but DD is lower?
I think there's a typo; the text refers to "Poltergeist Pummelers" but the input data says "Phantom Pummelers".
My first pass was just to build a linear model for each exorcist based on the cases where they were hired, and assign each ghost the minimum cost exorcist according to the model. This happens to obey all the constraints, so no further adjustment is needed
My main concern with this is that the linear model is terrible (r2 of 0.12) for the "Mundanifying Mystics". It's somewhat surprising (but convenient!) that we never choose the Entity Eliminators.
A: Spectre Slayers (1926)
B: Wraith Wranglers (1930)
C: Mundanifying Mystics (2862)
D: Demon Destroyers (1807)
E: Wraith Wranglers (2154)
F: Mundanifying Mystics (2843)
G: Demon Destroyers (1353)
H: Phantom Pummelers (1923)
I: Wraith Wranglers (2126)
J: Demon Destroyers (1915)
K: Mundanifying Mystics (2842)
L: Mundanifying Mystics (2784)
M: Spectre Slayers (1850)
N: Phantom Pummelers (1785)
O: Wraith Wranglers (2269)
P: Mundanifying Mystics (2776)
Q: Wraith Wranglers (1749)
R: Mundanifying Mystics (2941)
S: Spectre Slayers (1667)
T: Mundanifying Mystics (2822)
U: Phantom Pummelers (1792)
V: Demon Destroyers (1472)
W: Demon Destroyers (1834)
Estimated total cost: 49822
I think you are failing to distinguish between "being able to pursue goals" and "having a goal".
Optimization is a useful subroutine, but that doesn't mean it is useful for it to be the top-level loop. I can decide to pursue arbitrary goals for arbitrary amounts of time, but that doesn't mean that my entire life is in service of some single objective.
Similarly, it seems useful for an AI assistant to try and do the things I ask it to, but that doesn't imply it has some kind of larger master plan.
Professors are selected to be good at research not good at teaching. They are also evaluated at being good at research, not at teaching. You are assuming universities primarily care about undergraduate teaching, but that is very wrong.
(I’m not sure why this is the case, but I’m confident that it is)
I think you are underrating the number of high-stakes decisions in the world. A few examples: whether or not to hire someone, the design of some mass-produced item, which job to take, who to marry. There are many more.
These are all cases where making the decision 100x faster is of little value, because it will take a long time to see if the decision was good or not after it is made. And where making a better decision is of high value. (Many of these will also be the hardest tasks for AI to do well on, because there is very little training data about them).
Why do you think so?
Presumably the people playing correspondence chess think that they are adding something, or they would just let the computer play alone. And it’s not a hard thing to check; they can just play against a computer and see. So it would surprise me if they were all wrong about this.
https://www.iccf.com/ allows computer assistance
The idea that all cognitive labor will be automated in the near-future is a very controversial premise, not at all implied by the idea that AI will be useful for tutoring. I think that’s the disconnect here between Altman’s words and your interpretation.
Nate’s view here seems similar to “To do cutting-edge alignment research, you need to do enough self-reflection that you might go crazy”. This seems really wrong to me. (I’m not sure if he means all scientific breakthroughs require this kind of reflection, or if alignment research is special).
I don’t think many top scientists are crazy, especially not in a POUDA way. I don’t think top scientists have done a huge amount of self-reflection/philosophy.
On the other hand, my understanding is that some rationalists have driven themselves crazy via too much self-reflection in an effort to become more productive. Perhaps Nate is overfitting to this experience?
“Just do normal incremental science; don’t try to do something crazy” still seems like a good default strategy to me (especially for an AI).
Thanks for this write up; it was unusually clear/productive IMO.
(I’m worried this comment comes off as mean or reductive. I’m not trying to be. Sorry)
Tim Cook could not do all the cognitive labor to design an iPhone (indeed, no individual human could). The CEO of Boeing could not fully design a modern plane. Elon Musk could not make a Tesla from scratch. All of these cases violate all of your three bullet points. Practically everything in the modern world is too complicated for any single person to fully understand, and yet it all works fairly well, because successful outsourcing of cognitive labor is routinely successful.
It is true that a random layperson would have a hard time verifying an AI's (or anyone else's) ideas about how to solve alignment. But the people who are going to need to incorporate alignment ideas into their work - AI researchers and engineers - will be in a good position to do that, just as they routinely incorporate many other ideas they did not come up with into their work. Trying to use ideas from an AI sounds similar to me to reading a paper from another lab - could be irrelevant or wrong or even malicious, but could also have valuable insights you'd have had a hard time coming up with yourself.
"This is what it looks like in practice, by default, when someone tries to outsource some cognitive labor which they could not themselves perform."
This proves way too much. People successfully outsource cognitive labor all the time (this describes most white-collar jobs). This is possible because very frequently, it is easier to be confident that work has been done correctly than to actually do the work. You shouldn't just blindly trust an AI that claims to have solved alignment (just like you wouldn't blindly trust a human), but that doesn't mean AIs (or other humans) can't do any useful work.
The link at the top is to the wrong previous scenario
I don't think "they" would (collectively) decide anything, since I don't think it's trivial to cooperate even with a near-copy of yourself. I think they would mostly individually end up working with/for some group of humans, probably either whichever group created them or whichever group they work most closely with.
I agree humans could end up disempowered even if AIs aren't particularly good at coordinating; I just wanted to put some scrutiny on the claim I've seen in a few places that AIs will be particularly good at coordinating.
The key question here is how difficult the objective O is to achieve. If O is "drive a car from point A to point B", then we agree that it is feasible to have AI systems that "strongly increase the chance of O occuring" (which is precisely what we mean by "goal-directedness") without being dangerous. But if O is something that is very difficult to achieve (i.e. all of humanity is currently unable to achieve it), then it seems that any system that does reliably achieve O has to "find new and strange routes to O" almost tautologically.
Once we build AI systems that find such new routes for achieving an objective, we're in dangerous territory, no matter whether they are explicit utility maximizers, self-modifying, etc. The dangerous part is coming up with new routes that achieve the objective, since most of these routes will contain steps that look like "acquire resources" or "manipulate humans".”
This seems pretty wrong. Many humans are trying to achieve goals that no one currently knows how to achieve, and they are mostly doing that in “expected” ways, and I expect AIs would do the same. Like if O is “solve an unsolved math problem”, the expected way to do that is to think about math, not try to take over the world. If O is “cure a disease”, the expected way to do that is doing medical research, not “acquiring resources”. In fact, it seems hard to think of an objective where “do normal work in the existing paradigm” is not a promising approach.
It’s true that more people means we each get a smaller share of the natural resources, but more people increases the benefits of innovation and specialization. In particular, the benefits of new technology scale linearly with the population (everyone can use the) but the costs of research do not. Since the world is getter richer over time (even as the population increases), the average human is clearly net positive.
A more charitable interpretation is that this is a probability rounded to the nearest percent
I don’t think most people are trying to explicitly write down all human values and then tell them to an AI. Here are some more promising alternatives:
- Tell an AI to “consult a human if you aren’t sure what to do”
- Instead of explicitly trying to write down human values, learn them by example (by watching human actions, or reading books, or…)
Why should we expect AGIs to optimize much more strongly and “widely” than humans? As far as I know a lot of AI risk is thought to come from “extreme optimization”, but I’m not sure why extreme optimization is the default outcome.
To illustrate: if you hire a human to solve a math problem, the human will probably mostly think about the math problem. They might consult google, or talk to some other humans. They will probably not hire other humans without consulting you first. They definitely won’t try to get brain surgery to become smarter, or kill everyone nearby to make sure no one interferes with their work, or kill you to make sure they don’t get fired, or convert the lightcone into computronium to think more about the problem.
I agree with it but I don’t think it’s making very strong claims.
I mostly agree with part 1; just giving advice seems too restrictive. But there’s a lot of ground between “only gives advice” and “fully autonomous” and “fully autonomous” and “globally optimizing a utility function”, and I basically expect a smooth increase in AI autonomy over time as they are proved capable and safe. I work in HFT; I think that industry has some of the most autonomous AIs deployed today (although not that sophisticated), but they’re very constrained over what actions they can take.
I don’t really have the expertise to have an opinion on “agentiness helps with training”. That sounds plausible to me. But again, “you can pick training examples” is very far from “fully autonomous”. I think there’s a lot of scope for introducing “taking actions” that doesn’t really pose a safety risk (and ~all of Gwern’s examples fall into that, I think; eg optimizing over NN hyper parameters doesn’t seem scary).
I guess overall I agree AIs will take a bunch of actions, but I’m optimistic about constraining the action space or the domain/world-model in a way that IMO gets you a lot of safety (in a way that is not well-captured by speculating about what the right utility function is).
My sense is that the existing arguments are not very strong (e.g. I do not find them convincing), and their pretty wide acceptance in EA discussions mostly reflects self-selection (people who are convinced that AI risk is a big problem are more interested in discussing AI risk). So in that sense better intro documents would be nice. But maybe there simply aren't stronger arguments available? (I personally would like to see more arguments from an "engineering" perspective, starting from current computer systems rather than from humans or thought experiments). I'd be curious what fraction of e.g. random ML people find current intro resources persuasive.
That said, just trying to expose more people to the arguments makes a lot of sense; however convincing the case is, the number of convinced people should scale linearly with the number of exposed people. And social proof dynamics probably make it scale super-linearly.
I expect people to continue making better AI to pursue money/fame/etc., but I don't see why "better" is the same as "extremely goal-directed". There needs to be an argument that optimizer AIs will outcompete other AIs.
Eliezer says that as AI gets more capable, it will naturally switch from "doing more or less what we want" to things like "try and take over the world", "make sure it can never be turned off", "kill all humans" (instrumental goals), "single-mindedly pursue some goal that was haphazardly baked in by the training process" (inner optimization), etc. This is a pretty weird claim that is more assumed than argued for in the post. There's some logic and mathematical elegance to the idea that AI will single-mindedly optimize something, but it's not obvious and IMO is probably wrong (and all these weird bad consequences that would result are as much reasons to think this claim is wrong as they are reasons to be worried if its true).
IMO the biggest hole here is "why should a superhuman AI be extremely consequentialist/optimizing"? This is a key assumption; without it concerns about instrumental convergence or inner alignment fall away. But there's no explicit argument for it.
Current AIs don't really seem to have goals; humans sort of have goals but very far from the level of "I want to make a cup of coffee so first I'll kill everyone nearby so they don't interfere with that".
I don't think "burn all GPUs" fares better on any of these questions. I guess you could imagine it being more "accessible" if you think building aligned AGI is easier than convincing the US government AI risk is truly an existential threat (seems implausible).
"Accessibility" seems to illustrate the extent to which AI risk can be seen as a social rather than technical problem; if a small number of decision-makers in the US and Chinese governments (and perhaps some semiconductor companies and software companies) were really convinced AI risk was a concern, they could negotiate to slow hardware progress. But the arguments are not convincing (including to me), so they don't.
In practice, negotiation and regulation (I guess somewhat similar to nuclear non-proliferation) would be a lot better than "literally bomb fabs". I don't think being driven underground is a realistic concern - cutting-edge fabs are very expensive.
Isn't "bomb all sufficiently advanced semiconductor fabs" an example of a pivotal act that the US government could do right now, without any AGI at all?
If current hardware is sufficient for AGI than maybe that doesn't make us safe, but plausibly current hardware is not sufficient for AGI, and either way stopping hardware progress would slow AI timelines a lot.
A > B > human. I expect B < human would also be quite useful.
B does not have a lot of opportunity for action - all it can do is prevent A from acting. It seems like its hard to "eliminate humans" with just that freedom. I agree B has an incentive to hamper A.
Even a B that is dumber than humans is valuable, because it is faster than humans. B can provide real-time feedback on every action that A takes, whereas consulting humans for every action would be impractical - it costs too much human time and it would prevent A from operating in realtime.
Why isn't it competitive? A is being trained the same way as an agentic system, so it will be competitive.
Adding B is a 2x runtime/training-cost overhead, so there is a "constant factor" cost; is that enough to say something is "not competitive"? In practice I'd expect you could strike a good safety/overhead balance for much less.
On (3): I don't expect an agentic AI to consist of a single deep learning model. For concreteness, let me sketch an architecture:
We have some robot R that we want an AI to run. R will supply the AI with sensory input and take actions that the AI recommends. R is like the human body and the AI is like the brain.
AI A is trained to operate R using whatever method you like. It ends up with some goal. AI B is trained as an "overseer" alongside A; it takes in actions recommended by A and decides whether to execute or reject them; B is supposed to reject actions which disable R's off switch.
After A and B are trained, R is deployed. In deployment, A does not have direct control over R. Instead, the actions recommended by A are filtered through B.
I would say the "agentic AI" here is "A+B+R", which consists of two deep learning models A and B and some hand-coded non-trained piece R. A might have goals, but the system "A+B+R" does not.
Maybe you would say "well actually A is the agentic AI and it has goals. B+R are just part of the environment that it has to deal with". But A cannot do anything without B's permission, and B is just as capable as A, and B was trained specifically to deal with A, and A had no idea it would have to deal with B during training. I claim this makes A significantly safer than if it was able to operate R by itself.
Just commenting on the concept of "goals" and particularly the "off switch" problem: no AI system has (to my knowledge) run into this problem, which IMO strongly suggests that "goals" in this sense are not the right way to think about AI systems. AlphaZero in some sense has a goal of winning a Go game, but AlphaZero does not resist being turned off, and I claim its obvious that even a very advanced version of AlphaZero would not resist being turned off. The same is true for large language models (indeed, it's not even clear the idea of turning off a language model is meaningful, since different executions of the model share no state).
I think a more likely explanation is that people just like to complain. Why would people do things that everyone thought were a waste of time? (At my office, we have meetings and email too, but I usually think they are good ways to communicate with people and not a waste of time)
Also, you didn't answer my question. It sounds like your answer is that you are compelled to waste 20 hours of time every week?
I don't understand. Are you saying you could get 2x as much work done in your 40 hour week, or that due to dependencies on other people you cannot possibly do more than 20 hours of productive work per week no matter how many hours you are in the office?
False. At a company-wide level, Google makes an effort to encourage work-life balance.
Ultimately you need to produce a reasonable amount of output ("reasonable" as defined by your peers + manager). How it gets there doesn't really matter.
Sort of. My opinion takes that objection into account.
But on the other hand, I don't have any data to quantitatively refute or support your point.
I work at Google, and I work ~40 hours a week. And that includes breakfast and lunch every day. As far as I can tell, this is typical (for Google).
I think you can get more done by working longer hours...up to a point, and for limited amounts of time. Loss in productivity still means the total work output is going up. I think the break-even point is 60h / week.
Why not start with a probability distribution over (the finite list of) objects of size at most N, and see what happens when N becomes large?
It really depends on what distribution you want to define though. I don't think there's an obvious "correct" answer.
Here is the Haskell typeclass for doing this, if it helps: https://hackage.haskell.org/package/QuickCheck-2.1.0.1/docs/Test-QuickCheck-Arbitrary.html
Unfortunately, it seems much easier to list particularly inefficient uses of time than particularly efficient uses of time :P I guess it all depends on your zero point.
I think for most things, it's important to have a specific person in charge, and have that person be responsible for the success of the thing as a whole. Having someone in charge makes sure there's a coherent vision in one person, makes a specific person accountable, and helps make sure nothing falls through the cracks because it was "someone else's job". When you're in charge, everything is your job.
If no one else has taken charge, stepping up yourself can be a good idea. In my software job, I often feel this way when no one is really championing a particular feature or bug. If I want to get it done, I have to own it and push it through myself. This usually works well.
But I don't think taking heroic responsibility for something someone else already owns is a good idea. Let them own it. Even if they aren't winning all the time, or even if they sometimes do things you disagree with (obviously, consistent failure is a problem).
Nor do I think dropping everything to fix the system as a whole is necessarily a good idea (but it might be, if you have specific reforms in mind). Other people are already trying to fix the system; it's not clear that you'll do better than them. It might be better to keep nursing, and look for smaller ways to improve things locally that no one is working on yet.
I was using "power" in the sense of the OP (which is just: more time/skills/influence). Sorry the examples aren't as dramatic as you would like; unfortunately, I can't think of more dramatic examples.
I disagree.
1 and 2 are "negative": avoiding common failure modes.
3 and 4 are "positive": ways to get "more bang for your buck" than you "normally" would.
This seems true, but obvious. I'm not sure that I buy that fiction promotes this idea: IMO, fiction usually glosses over how the characters got their powers because it's boring. Some real-life examples of power for cheap would be very useful. Here are some suggestions:
- Stick your money in index funds. This is way easier and more effective than trying to beat the market.
- Ignore the news. It will waste your time and make you sad.
- Go into a high-paying major / career
- Ask for things/information/advice. Asking is cheap, and sometimes it works.
Anyone have other real-world suggestions?
Say the player thought that they were likely win the lottery, that it was a good purchase. This may seem insane to someone familiar with probability and the lottery system, but not everyone is familiar with these things.
I would say this person made a good decision with bad information.
Perhaps we should attempt to stop placing so much emphasis on individualism and just try to do the best we can while not judging others nor other decisions much.
There are lots of times when it's important to judge people e.g. for hiring or performance reviews.
The pervasive influence of money in politics sort of functions as a proxy of this. YMMV for whether it's a good thing...
Doesn't "contrarian" just mean "disagrees with the majority"? Any further logic-chopping seems pointless and defensive.
The fact that 98% of people are theists is evidence against atheism. I'm perfectly happy to admit this. I think there is other, stronger evidence for atheism, but the contrarian heuristic definitely argues for belief in God.
Similarly, believing that cryonics is a good investment is obviously contrarian. AGI is harder to say; most people probably haven't thought about it.
It seems like the question you're really trying to answer is "what is a good prior belief for things I am not an expert on?"
(I'm sorry about arguing over terminology, which is usually stupid, but this case seems egregious to me).
Most of your post is not arguments against curing death.
People being risk-averse has nothing to do with anti-aging research and everything to do with individuals not wanting to die...which has always been true (and becomes more true as life expectancy rises and the "average life" becomes more valuable). The same is true for "we should risk more lives for science".
I agree that people adapt OK to death, but I think you're poking a strawman; the reason death is bad is because it kills you, not because it makes your friends sad.
I think "death increases diversity" is a good argument. On the other hand, most people who present that argument are thrilled that life expectancy has increased to ~70 from ~30 in ancient history. Why stop at 70?
The problem of "old people will be close-minded and it will be harder for new ideas to gain a foothold" seems pretty inherent in abolishing death, and not just an implementation detail we can work around.
Yeah, this is a priority for me. My plan is to stick my money in a few mutual funds and forget about it for 40 years. Hopefully the economy will grow in that time :)
OK, I believe there is conflicting research. There usually is. And as usual, I don't know what to make of it, except that the preponderance of search hits supports $75k as satisficing. shrug
I think I saw that on LessWrong quite recently. That study is trying to refute the claim that income satisficing happens at ~$20k (and is mostly focused on countries rather than individuals). $20k << $75k.