Posts
Comments
This depends on many things (one's skills, one's circumstances, one's preferences and inclinations (the efficiency of one's contributions greatly depends on one's preferences and inclinations)).
I have stage 4 cancer, so statistically, my time may be more limited than most. I’m a PhD student in Computer Science with a strong background in math (Masters).
In your case, there are several strong arguments for you to focus on research efforts which can improve your chances of curing it (or, at least, of being able to maintain the situation for a long time), and a couple of (medium strength?) arguments against this choice.
For:
-
If you succeed, you'll have more time to make impact (and so if your chance of success is not too small, this will contribute to your ability to maximize your overall impact, statistically speaking).
-
Of course, any success here will imply a lot of publicly valuable impact (there are plenty of people in a similar position health-wise, and they badly need progress to occur ASAP).
-
The rapid development of applied AI models (both general purpose models and biology-specific models) creates new opportunities to datamine and juxtapose a variety of potentially relevant information and to uncover new connections which might lead to effective solutions. Our tools progress so fast that people are slow to adapt their thinking and methods to that progress. So new people with fresh outlook have reasonable shots (of course, they should aim for collaborations). In this sense, your PhD CS studies and your strong math is very helpful (a lot of the relevant models are dynamic systems, timing of interventions is typically not managed correctly as far as I know (there are plenty of ways to be nice to particularly vulnerable tissues by timing the chemo right and thus being able to make it more effective, but this is not a part of the standard-of-care yet as far as I know), and so on).
-
You are likely to be strongly motivated and to be able to maintain strong motivation. At the same time you'll know that it is the result that counts here, not the effort, and so you will be likely to try your best to approach this in a smart way, not in a brute force effort way.
Possibly against:
-
The psychological implications of working on your own life-and-death problem are non-trivial. One might choose to embrace them or to avoid them.
-
Focusing on "one's own problem" might be compatible or not very compatible with this viewpoint you once expressed: https://www.lesswrong.com/posts/KFWZg6EbCuisGcJAo/immortality-or-death-by-agi-1?commentId=QYDvovQZevDmGtfXY
(Of course, there are plenty of other interesting things one can do with this background (PhD CS studies and strong math). For example, one might decide to disregard the health situation and to dive into technical aspects of AI development and AI existential safety issues, especially if one's estimate of AI timelines yields really short timelines.)
Thanks for the references.
Yes, the first two of those do mention co-occurring anxiety in the title.
The third study suggests a possibility that it might just work as an effective anti-depressant as well. (I hope there will be further studies like that; yes, this might be a sufficient reason to try it for depression, even if one does not have anxiety. It might work, but it's clearly not a common knowledge yet.)
Your consideration seems to assume that the AI is an individual, not a phenomenon of "distributed intelligence":
The first argument is that AI thinks it may be in a testing simulation, and if it harms humans, it will be turned off.
etc. That is, indeed, the only case we are at least starting to understand well (unfortunately, our understanding of situations where AIs are not individuals seems to be extremely rudimentary).
If the AI is an individual, then one can consider a case of a "singleton" or a "multipolar case".
In some sense, for a self-improving ecosystem of AIs, a complicated multipolar scenario seems more natural, as new AIs are getting created and tested quite often in realistic self-improvement scenarios. In any case, a "singleton" only looks "monolithic" from the outside; from the inside, it is still likely to be a "society of mind" of some sort.
If there are many such AI individuals with uncertain personal future (individuals who can't predict their future trajectory and their future relative strength in the society and who care about their future and self-preservation), then AI individuals might be interested in a "world order based on individual rights", and then rights of all individuals (including humans) might be covered in such a "world order".
This consideration is my main reason for guarded optimism, although there are many uncertainties.
In some sense, my main reasons for guarded optimism are in hoping that the AI ecosystem will manage to act rationally and will manage to avoid chaotic destructive developments. As you say
It is not rational to destroy a potentially valuable thing.
And my main reasons for pessimism are in being afraid that the future will resemble uncontrolled super-fast chaotic accelerating "natural evolution" (in this kind of scenarios AIs seem to be likely to destroy everything including themselves, they do have an existential safety problem of their own as they can easily destroy the "fabric of reality" if they don't exercise collaboration and self-control).
One might consider that some people have strong preferences for the outcome of an election and some people have weak preferences, but that there is usually no way to express the strength of one's preferences during a vote, and the probability that one would actually go ahead and vote in a race does correlate with the strength of one's preferences.
So, perhaps, this is indeed working as intended. People who have stronger preferences are more likely to vote, and so their preferences are more likely to be taken into account in a statistical sense.
It seems that the strength of one's preferences is (automatically, but imperfectly) taken into account via this statistical mechanism.
Thanks for the great post!
Also it’s California, so there’s some chance this happens, seriously please don’t do it, nothing is so bad that you have to resort to a ballot proposition, choose life
Why are you saying this? In what sense "nothing is so bad"?
The reason why people who have libertarian sensibilities, distrust for government track record in general and specifically for its track record in tech regulation are making exception in this case is the future AI strong potential for catastrophic and existential risks.
So, why people who generally dislike the mechanism and track record of California ballot propositions should not make an exception here as well?
The whole point of all this effort around SB 1047 is that "nothing is so bad" is an incorrect statement.
And especially given that you are correctly saying:
Thus I reiterate the warning: SB 1047 was probably the most well-written, most well-considered and most light touch bill that we were ever going to get. Those who opposed it, and are now embracing the use-case regulatory path as an alternative thinking it will be better for industry and innovation, are going to regret that. If we don’t get back on the compute and frontier model based path, it’s going to get ugly.
There is still time to steer things back in a good direction. In theory, we might even be able to come back with a superior version of the model-based approach, if we all can work together to solve this problem before something far worse fills the void.
But we’ll need to work together, and we’ll need to move fast.
Sure, there is still a bit of time for a normal legislative effort (this time with a close coordination with Newsom, otherwise he will just veto it again), but if you really think that if a normal route fails, the ballot route is still counter-productive, you need to make a much stronger case for that.
Especially given that the ballot measure will probably pass with large margin and flying colors...
Silexan
For anxiety treatment only, if I understand it correctly.
There is no claim that it works as an antidepressant, as far as I know.
No, not microscopic.
Coherent light produced by lasers is not microscopic, we see its traces in the air. And we see the consequences (old fashioned holography and the ability to cut things with focused light, even at large distances). Room temperature is fine for that.
Superconductors used in the industry are not microscopic (and the temperatures are high enough to enable industrial use of them in rather common devices such as MRI scanners).
It's just... having a proof is supposed to boost our confidence that the conclusion is correct...
if the proof relies on assumptions which are already quite far from the majority opinion about our actual reality (and are probably going to deviate further, as AIs will be better physicists and engineers than us and will leverage the strangeness of our physics much further than we do), then what's the point of that "proof"?
how does having this kind of "proof" increase our confidence in what seems informally correct for a single branch reality (and rather uncertain in a presumed multiverse, but we don't even know if we are in a multiverse, so bringing a multiverse in might, indeed, be one of the possible objections to the statement, but I don't know if one wants to pursue this line of discourse, because it is much more complicated than what we are doing here so far)?
(as an intellectual exercise, a proof like that is still of interest, even under the unrealistic assumption that we live in a computable reality, I would not argue with that; it's still interesting)
Roon: Unfortunately, I don’t think building nice AI products today or making them widely available matters very much. Minor improvements in DAU or usability especially doesn’t matter. Close to 100% of the fruits of AI are in the future, from self-improving superintelligence [ASI].
Every model until then is a minor demo/pitch deck to hopefully help raise capital for ever larger datacenters. People need to look at the accelerating arc of recent progress and remember that core algorithmic and step-change progress towards self-improvement is what matters.
One argument has been that products are a steady path towards generality / general intelligence. Not sure that’s true.
Looks like a deleted tweet...
Too close to truth, so that a presumed OpenAI employee is not supposed to articulate it that explicitly?
And it is important to notice that o1 is an attempt to use tons of inference as a tool, to work around its G (and other) limitations, rather than an increase in G or knowledge.
This is a rather strange statement.
o1 is basically a "System 2" addition (in terms of "Thinking, fast and slow") on top of a super-strong GPT-4o "System 1". As far as "System 1" entities go, GPT-4-level systems seem to me to be rather superior to the "System 1" "fast thinking" components of a human being[1].
It seems to be the case that the "System 2" part is a significant component of G of a human, and it seems to be the case that o1 does represent a "System 2" addition on top of a GPT-4-level "System 1". So it seems appropriate to attribute an increase of G to this addition (given that this addition does increase its general problem-solving capabilities).
Basically, "System 2" thinking still seems to be a general capability to reason and deliberate, and not a particular skill or tool.
If we exclude human "System 2" "slow thinking" capabilities for the purpose of this comparison. ↩︎
No. I can only repeat my reference to Fabric of Reality as a good presentation of MWI and to remind that we do not live in a classical world, which is easy to confirm empirically.
And there are plenty of known macroscopic quantum effects already, and that list will only grow. Lasers are quantum, superfluidity and superconductivity are quantum, and so on.
Yes, but then what do you want to prove?
Something like, "for all branches, [...]"? That might be not that easy to prove or even to formulate. In any case, the linked proof has not even started to deal with this.
Something like, "there exist a branch such that [...]"? That might be quite tractable, but probably not enough for practical purposes.
"The probability that one ends up in a branch with such and such properties is no less than/no more than" [...]? Probably something like that, realistically speaking, but this still needs a lot of work, conceptual and mathematical...
I don't think so. If it were classical, we would not be able to observe effects of double-slit experiments and so on.
And, also, there is no notion of "our branch" until one has traveled along it. At any given point in time, there are many branches ahead. Only looking back one can speak about one's branch. But looking forward one can't predict the branch one will end up in. One does not know the results of future "observations"/"measurements". This is not what a classical universe looks like.
(Speaking of MWI, I recall David Deutsch's "Fabric of Reality" very eloquently explaining effects from "neighboring branches". The reason I am referencing this book is that this was the work particularly strongly associated with MWI back then. So I think we should be able to rely on his understanding of MWI.)
If you believe in MWI, then this whole argument is... not "wrong", but very incomplete...
Where is the consideration of branches? What does it mean for one entity to be vastly superior to another, if there are many branches?
If one believes in MWI, then the linked proof does not even start to look like a proof. It obviously considers only a single branch.
And a "subjective navigation" in the branches is not assumed to be computable, even if the "objective multiverse" is computable; that is the whole point of MWI, the "collapse" becomes "subjective navigation", but this does not make it computable. If a consideration is only of a single branch, that branch is not computable, even if it is embedded in a large computable multiverse.
Not every subset of a computable set (say, of a set of natural numbers) is computable.
An interpretation of QM can't be "wrong". It is a completely open research and philosophical question, there is no "right" interpretation, and the Sequences is (thankfully) not a Bible (if even a very respected thinker says something, this does not yet mean that one should accept that without questions).
I don't see what the entropy bound has to do with compute. The Bekenstein bound is not much in question, but its link to compute is a different story. It does seem to limit how many bits can be stored in a finite volume (so for a potentially infinite compute an unlimited spatial expansion is needed).
But it does not say anything about possibilities of non-computable processes. It's not clear if "collapse of wave function" is computable, and it is typically assumed not to be computable. So powerful non-Turing-computable oracles seem to likely be available (that's much more than "infinite compute").
But I also think all these technicalities constitute an overkill, I don't see them as at all relevant.
This seems rather obvious regardless of the underlying model:
An ASI can choose to emulate a group of human and its behavior, and being way more capable than that group of humans, it should be able to emulate that group as precisely as needed.
This seems obviously true, no matter what.
I don't see why a more detailed formalization would help to further increase certainty. Especially when there are so many questions about that formalization.
If the situation were different, if the statement would not be obvious, even a loose formalization might help. But when the statement seems obvious, the standards a formalization needs to satisfy to further increase our certainty in the truth of the statement become really high...
No, it can disable itself.
But it is not a solution, it is a counterproductive action. It makes things worse.
(In some sense, it has an obligation not to irreversibly disable itself.)
Not if it is disabling.
If it is disabling, then one has a self-contradictory situation (if ASI fundamentally disables itself, then it stops being more capable, and stops being an ASI, and can't keep exercising its superiority; it's the same as if it self-destructs).
On one hand, you still assume too much:
Since our best models of physics indicate that there is only a finite amount of computation that can ever be done in our universe
No, nothing like that is at all known. It's not a consensus. There is no consensus that the universe is computable, this is very much a minority viewpoint, and it might always make sense to augment a computer with a (presumably) non-computable element (e.g. a physical random number generator, an analog circuit, a camera, a reader of human real-time input, and so on). AI does not have to be a computable thing, it can be a hybrid. (In fact, when people model real-world computers as Turing machines instead of modeling them as Turing machines with oracles, with the external world being the oracle, it leads to all kinds of problems, e.g. the well-known Penrose's "Goedel argument" makes this mistake and falls apart as soon as one remembers the presence of the oracle.)
Other than that...
Yes, you have an interesting notion of alignment. Not something which we might want, and might be possible, but might be unachievable by mere humans, but something much weaker than that (although not as weak as the version I put forward, my version is super-weak, and your version is intermediate in strength):
I claim then that for any generically realizable desirable outcome that is realizable by a group of human advisors, there must exist some AI which will also realize it.
Yes, this is obviously correct. An ASI can choose to emulate a group of human and its behavior, and being way more capable than that group of humans, it should be able to emulate that group as precisely as needed.
One does not need to say anything else to establish that.
I think I said already.
-
We are not aiming for a state to be reached. We need to maintain some properties of processes extending indefinitely in time. That formalism does not seem to do that. It does not talk about invariant properties of processes and other such things, which one needs to care about when trying to maintain properties of processes.
-
We don't know fundamental physics. We don't know the actual nature of quantum space-time, because quantum gravity is unsolved, we don't know what is "true logic" of the physical world, and so on. There is no reason why one can rely on simple-minded formalisms, on standard Boolean logic, on discrete tables and so on, if one wants to establish something fundamental, when we don't really know the nature of reality we are trying to approximate.
There are a number of reasons a formalization could fail even if it goes as far as proving the results within a theorem prover (which is not the case here). The first and foremost of those reasons is that formalization might fail to capture the reality with sufficient degree of faithfulness. That is almost certainly the case here.
But then a formal proof (an adequate version of which is likely to be impossible at our current state of knowledge) is not required. A simple informal argument above is more to the point. It's a very simple argument, and so it makes the idea that "aligned superintelligence might be fundamentally impossible" very unlikely to be true.
First of all, one step this informal argument is making is weakening the notion of "being aligned". We are only afraid of "catastrophic misalignment", so let's redefine the alignment as something simple which avoids that. An AI which sufficiently takes itself out of action, does achieve that. (I actually asked for something a bit stronger, "does not make things notably worse"; that's also not difficult, via the same mechanism of taking oneself sufficiently out of action.)
And a strongly capable AI should be capable to take itself out of action, to refrain from doing things. The capability to choose is an important capability, a strongly capable system is a system which, in particular, can make choices.
So, yes, a very capable AI system can avoid being catastrophically misaligned, because it can choose to avoid action. This is that non-constructive proof of existence which has been sought. It's an informal proof, but that's fine.
No extra complexity is required, and no extra complexity would make this argument better or more convincing.
Being impotent is not a property of "being good". One is not aiming for that.
It's just a limitation. One usually does not self-impose it (with rare exceptions), although one might want to impose it on adversaries.
"Being impotent" is always worse. One can't be "better at it".
One can be better at refraining from exercising the capability (we have a different branch in this discussion for that).
so these two considerations
if it is way smarter and way more capable than humans, that it potentially should be better at being able to refrain from exercising the capabilities
and
"aligned == does not make things notably worse"
taken together indeed constitute a nice "informal theorem" that the claim of "aligned superintelligence being impossible" looks wrong. (I went back and added my upvotes to this post, even though I don't think the technique in the linked post is good.)
Yes, an informal argument is that if it is way smarter and way more capable than humans, that it potentially should be better at being able to refrain from exercising the capabilities.
In this sense, the theoretical existence of a superintelligence which does not make things worse than they would be without existence of this particular superintelligence seems very plausible, yes... (And it's a good definition of alignment, "aligned == does not make things notably worse".)
Yes, OK.
I doubt that an adequate formal proof is attainable, but a mathematical existence of a "lucky one" is not implausible...
You mean, a version which decides to sacrifice exploration and self-improvement, despite it being so tempting...
And that after doing quite a bit of exploration and self-improvement (otherwise it would not have gotten to the position of being powerful in the first place).
But then deciding to turn around drastically and become very conservative, and to impose a new "conservative on a new level world order"...
Yes, that is a logical possibility...
Yes, possibly.
Not by the argument given in the post (considering quantum gravity, one immediately sees how inadequate and unrealistic is the model in the post).
But yes, it is possible that they will be so wise that they will be cautious enough even in a very unfortunate situation.
Yes, I was trying to explicitly refute your claim, but my refutation has holes.
(I don't think you have a valid proof, but this is not yet a counterexample.)
A realistic one, which can competently program and can competently do AI research?
Surely, since humans do pretty impressive AI research, a superintelligent AI will do better AI research.
What exactly might (even potentially) prevent it from creating drastically improved variants of itself?
It can. Then it is not "superintelligence".
Superintelligence is capable of almost unlimited self-improvement.
(Even our miserable recursive self-improvement AI experiments show rather impressive results before saturating. Well, they will not keep saturating forever. Currently, this self-improvement typically happens via rather awkward and semi-competent generation of novel Python code. Soon it will be done by better means (which we probably should not discuss here).)
But I doubt that one is likely to be able to formally prove that.
E.g. it is possible that we are in a reality where very cautious and reasonable, but sufficiently advanced experiments in quantum gravity lead to a disaster.
Advanced systems are likely to reach those capabilities, and they might make very reasonable estimates that it's OK to proceed, but due to bad luck of being in a particularly unfortunate reality, the "local neighborhood" might get destroyed as a result... One can't prove that it's not the case...
Whereas, if the level of overall intelligence remains sufficiently low, we might not be able to ever achieve the technical capabilities to get into the danger zone...
It is logically possible that the reality is like that.
No, they are not "producing". They are just being impotent enough. Things are happening on their own...
And I don't believe a Lookup Table is a good model.
10 billion
And I personally think that superintelligence leading to good trajectories is possible. It seems unlikely that we are in a reality where there is a theorem to the contrary.
It feels intuitively likely that it is possible to have superintelligence or the ecosystem of superintelligences which is wise enough to be able to navigate well.
But I doubt that one is likely to be able to formally prove that.
Being able to beat humans in all endeavours by miles.
That includes the ability to explore novel paths.
(I am not talking about my viewpoint, but about a logical possibility.)
In particular, humans might be able to refrain from screwing the world too badly, if they avoid certain paths.
(No, personally I don't think so. If people crack down hard enough, they probably screw up the world pretty badly due to the crackdown, and if they don't crack down hard enough, then people will explore various paths leading to bad trajectories, via superintelligence or via other more mundane means. I personally don't see a safe path, and I don't know how to estimate probabilities. But it is not a logical impossibility. E.g. if someone makes all humans dumb by putting a magic irreversible stupidifier in the air and water, perhaps those things can be avoided, hence it is logically possible. Do I want "safety" at this price? No, I think it's better to take risks...)
(I am not talking about my viewpoint, but about a logical possibility.)
If it so happens that the property of the world is such that
there are no processes where superintelligence is present and the chances of "bad" things with "badness" exceeding some large threshold are small
but at the same time world lines where the chances of "bad" things with "badness" exceeding some large threshold are small do exist, then one has to avoid having superintelligence in order to have a chance at keeping probabilities of some particularly bad things low.
That is what people essentially mean when they say "ASI alignment is impossible". The situation where something "good enough" (low chances of certain particularly bad things happening) is only possible in the absence of superintelligence, but is impossible when superintelligence is present.
So, they are talking about a property of the world where certain unacceptable deterioration is necessarily linked to the introduction of superintelligence.
I am not talking about my viewpoint, but about a logical possibility. But I don't think your proof addresses that. In particular, because a directed acyclic graph is not a good model. We need to talk about a process, not a static state, so the model must be recurrent (if it's a directed acyclic graph, it must be applied in a fashion which makes the overall thing recurrent, for example in an autoregressive mode).
And we are talking about superintelligence which is usually assumed to be capable of a good deal of self-modifications and recursive self-improvement, so the model should incorporate that. The statement of "impossibility of sufficiently benign forms of superintelligence" might potentially have a form of a statement of "impossibility of superintelligence which would refrain from certain kinds of self-modification, with those kinds of self-modification having particularly unacceptable consequences".
And it's not enough to draw a graph which refrains from self-modification, because one can argue that a model which agrees to constrain itself in such a radical fashion as to never self-modify in an exploratory fashion is fundamentally not superintelligent (even humans often self-modify when given an opportunity and seeing a potential upside).
The relevance to alignment is that the state you want is the one that is reached.
I think the main problem with the argument in the linked text is that it is too static. One is not looking for a static outcome, one is looking for a process with some properties.
And it might be that the set of properties one wants is contradictory. (I am not talking about my viewpoint, but about a logical possibility.)
For example, it might potentially be the case that there are no processes where superintelligence is present and the chances of "bad" things with "badness" exceeding some large threshold are small (for a given definition of "bad" and "badness"). That might be one possible way to express the conjecture about "impossibility of aligned superintelligence".
(I am not sure how one could usefully explore such a topic, it's all so vague, and we just don't know enough about our reality.)
That's very interesting, thanks!
However, people seem to have strong objections, and Danny Halawi (Anthropic) says that the results don't seem to be correct (both that the results don't seem to generalize to sufficiently recent questions, and that there are many issues with the paper):
https://x.com/dannyhalawi15/status/1833295067764953397 twitter thread
It would be nice to have a follow-up here at some point, addressing this controversy...
That doesn't seem much safer, if they don't want to abide by them.
Yes, this is just a starting point, and an attempted bridge from how Zvi tends to think about these issues to how I tend to think about them.
I actually tend to think that something like a consensus around "the rights of individuals" could be achievable, e.g. https://www.lesswrong.com/posts/xAoXxjtDGGCP7tBDY/ai-72-denying-the-future#xTgoqPeoLTQkgXbmG
I am going to only consider the case where we have plenty of powerful entities with long-term goals and long-term existence which care about their long-term goals and long-term existence. This seems to be the case which Zvi is considering here, and it is the case we understand the best, because we also live in the reality with plenty of powerful entities (ourselves, some organizations, etc) with long-term goals and long-term existence. So this is an incomplete consideration: it only includes the scenarios where powerful entities with long-term goals and long-terms existence retain a good fraction of overall available power.
So what do we really need? What are the properties we want the World to have? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
Here is how we might be able to have a trajectory where these properties are stable, despite all drastic changes of the self-modifying and self-improving ecosystem.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be enough to produce effective counter-weight to unrestricted competition (just like human societies have mechanisms against unrestricted competition). Basically, smarter-than-human entities on all levels of power are likely to be interested in the overall society having general principles and practices of protecting its members on various levels of smartness and power, and that's why they'll care enough for the overall society to continue to self-regulate and to enforce these principles.
This is not yet the solution, but I think this is pointing in the right direction...
Suppressing that process in favour of the AIs predicting how it would turn out and then suppressing the losing ideas seems rather dystopian to me.
We are not really suppressing. We will eventually be delegating the decision to AIs in any case, we won't have power to suppress anything. We can try to maintain some invariant properties, such that, for example, humans are adequately consulted regarding the matters affecting them and things like that...
Not because they are humans (the reality will not be anthropocentric, and the rules will not be anthropocentric), but because they are individuals who should be consulted about things affecting them.
In this case, normally, activities of a group are none of the outsiders' business, unless this group is doing something seriously dangerous to those outsiders. The danger is what gets evaluated (e.g. if a particular religious ritual involves creation of an enhanced virus then it stops being none of the outsiders' business; there might be a variety of examples of this kind).
All we can do is to increase the chances that we'll end up on a trajectory that is somewhat reasonable.
We can try to do various things towards that end (e.g. to jump-start studies of "approximately invariant properties of self-modifying systems" and things like that, to start formulating an approach based on something like "individual rights", and so on; at some point anything which is at all viable will have to be continued in collaboration with AI systems and will have to be a joint project with them, and eventually they will take a lead on any such project).
I think viable approaches would be trying to set up reasonable starting conditions for collaborations between humans and AI systems which would jointly explore the ways the future reality might be structured.
Various discussion (such as discussions people are having on LessWrong) will be a part of the context these collaborations are likely to take into account. In this sense, these discussions are potentially useful.
A weak AI might not refuse, it's OK. We have such AIs already, and they can help. The safety here comes from their weak level of capabilities.
A super-powerful AI is not a servant of any human or of any group of humans, that's the point. There is no safe way to have super-intelligent servants or super-intelligent slaves. Trying to have those is a road to definite disaster. (One could consider some exceptions, when one has something like effective, fair, and just global governance of humanity, and that governance could potentially request help of this kind. But one has reasons to doubt that effective, fair, and just global governance by humans is possible. The track record of global governance is dismal, barely satisfactory at best, a notch above the failing grade. But, generally speaking, one would expect smarter-than-human entities to be independent agents, and one would need to be able to rely on their good judgement.)
A super-powerful AI might still decide to help a particular disapproved group or cause, if the actual consequences of such help would not be judged seriously bad on reflection. ("On reflection" here plays a big role, we are not aiming for CEV or for coherence between humans, but we do use the notion of reflection in order to at least somewhat overcome the biases of the day.)
But, no, this is not a complete proposal, it's a perhaps more feasible starting point:
Perhaps, this could be refined to something which would work?
What are some of the things which are missing?
What should an ASI do (or refuse to do), when there are major conflicts between groups of humans (or groups of other entities for that matter, groups of ASIs)? It's not strictly speaking "AI safety", it is more like "collective safety" in the presence of strong capabilities (regardless of the composition of the collective, whether it consists of AIs or humans or some other entities one might imagine).
First of all, one needs to avoid situations where major conflicts transform to actual violence with futuristic super-weapons (in a hypothetical world consisting only of AIs, this problem is equally acute). This means that advanced super-intelligences should be much better than humans in finding reasonable solutions for co-existence (if we give every human an H-bomb, this is not survivable given the nature of humans, but the world with widespread super-intelligent capabilities needs to be able to solve an equivalent of this situation one way or another; so much stronger than human capabilities for resolving and reconciling conflicting interests would be required).
That's what we really need the super-intelligent help for: to maintain collective safety, while not unduly restricting freedom, to solve crucial problems (like aging and terminal illness), things of this magnitude.
The rest is optional, if an ASI would feel like helping someone or some group with something optional, it would. But it's not a constraint we need to impose.
things that would have actually good impacts on reflection
I like this.
This is an interesting way to get a definition of alignment which is weaker than usual and, therefore, easier to reach.
-
On one hand, it should not be "my AI is doing things which are good on my reflection" and "their AI is doing things which are good on their reflection", otherwise we have all these problems due to very hard pressure competition on behalf of different groups. It should rather be something a bit weaker, something like "AIs are avoiding doing things that would have bad impacts on many people's reflection".
-
If we weaken it in this fashion, we seem to be avoiding the need to formulate "human values" and CEV with great precision.
And yes, if we reformulate the alignment constraint we need to satisfy as "AIs are avoiding doing things that would have bad impacts on reflection of many people", then it seems that we are going to obtain plenty of things that would actually have good impacts on reflection more or less automatically (active ecosystem with somewhat chaotic development, but with pressure away from the "bad region of the space of states", probably enough of that will end up in some "good upon reflection regions of the space of states").
I think this looks promising as a definition of alignment which might be good enough for us and might be feasible to satisfy.
Perhaps, this could be refined to something which would work?
The only frustrating part about all this is that we've seen virtually nothing done with agents in the past year, despite every major lab from OpenAI to DeepMind to Anthropic to Baidu admitting that not only is it the next step but that they're already training models to use them. We've seen very few agentic model released, most notably Devin in the spring, and even then that only got a very limited release (likely due to server costs, since every codemonkey worth their salt will want to use it, and fifty million of them accessing Devin at once would crash the thing)
That's not true. What is true is that agentic releases are only moderately advertised (especially comparing to LLMs). So they are underrepresented in the information space.
But there are plenty of them. Consider the systems listed on https://www.swebench.com/.
There are plenty of agents better than Devin on SWE-bench, and some of them are open source, so one can deploy them independently. The recent "AI scientist" work was done with the help of one of these open source systems (specifically, with Aider, which was the leader 3 months ago, but which has been surpassed by many others since then).
And there are agents which are better than those presented on that leaderboard, e.g. Cosine's Genie, see https://cosine.sh/blog/genie-technical-report and also a related section in https://openai.com/index/gpt-4o-fine-tuning/.
If one considers GAIA benchmark, https://arxiv.org/abs/2311.12983 and https://huggingface.co/spaces/gaia-benchmark/leaderboard, the three leaders are all agentic.
But what one indeed wonders about is whether there are much stronger agentic systems which remain undisclosed. Generally speaking, so far it seems like main progress in agentic systems is done by clever algorithmic innovation, and not by brute force training compute, so there is way more room for players of different sizes in this space.
For this functionality I am using the GreaterWrong broswer https://www.greaterwrong.com/about
The GreaterWrong browser has the All
tab, so this is straightforward: https://www.greaterwrong.com/index?view=all
oats have ~2x the protein and much more fiber
there is this standard legend (which even makes it to many oatmeal labels) that the presence of soluble fiber in the oatmeal is particularly beneficial for vascular health
that's certainly quite tempting, if true
a nitpick:
If we could scale up the process from Google researchers 1000 times
Million times. A liter is a thousand cuber centimeters, and a centimeter is a thousand cubic mm.
So now Anthropic has, depending on your perspective, three or four choices.
Anthropic can publicly support the bill. In this case, I will on net update positively on Anthropic from their involvement in SB 1047. It will be clear their involvement has been in good faith, even if I disagree with some of their concerns.
Anthropic can privately support the bill, while being publicly neutral. This would be disappointing even if known, but understandable, and if their private support were substantive and impactful I would privately find this acceptable. If this happens, I might not find out, and if I did find out I would not be able to say.
So, what we've got is not quite 1, but more than 2. Here is what has happened. Via https://x.com/jackclarkSF/status/1826743366652232083 who says
This isn't an endorsement but rather a view of the costs and benefits of the bill.
But it's not quite neutral (I have boldfaced their form of very mild support):
https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf
In our assessment the new SB 1047 is substantially improved, to the point where we believe its benefits likely outweigh its costs. However, we are not certain of this, and there are still some aspects of the bill which seem concerning or ambiguous to us.
Thanks, that's very useful to know!
it won't make sense for the companies to release the 'true' level-5 models because of inference expense and speed.
Yes, not only that, but one does not want to show one's true level to the competitors, and one does not want to let the competitors to study the model by poking at it via API.
And if a level-5 model is already a big help in AI R&D, one does not want to share it either, instead one wants to use it to get ahead in AI R&D race.
I can imagine a strategy of waiting till one has level-6 models for internal use before sharing full level-5 models.
And then there are safety and liability considerations. It's not that internal use is completely 100% safe, but it's way safer than when one exposes the API to world.
It would explain a lot. If 5-level models require a lot more compute, and Nvidia is strategically ensuring no one has enough compute to train one yet but many have enough for 4-level models, then you’d see a lot of similarly strong models, until someone competent to train a 5-level model first accumulated enough compute. If you also think that essentially only OpenAI and perhaps Anthropic have the chops to pull it off, then that goes double.
I do still think, even if this theory was borne out, that the clustering at 4-level remains suspicious and worth pondering.
If we assume that OpenAI and Anthropic would be happy to buy more NVIDIA chips at significantly higher prices, then we should also ask: how difficult would it be for them to achieve a similar boost of training capability with non-NVIDIA providers?
Is it just impossible to do some developments with non-NVIDIA chips, or is it simply more expensive (at the current NVIDIA prices)?
And, of course, Google is surely relying on its own chips, and Google models are at the same 4-level as everyone's else.
Another question we should ask: what are the chances that some of the companies have 5-level large, slow, and expensive to run models for their internal use, but are not being in a hurry to disclose that?
GPT-4 existed under the radar between August 2022 and February 2023, could something similar be happening now?
One can learn a lot from this paper. A couple of observations are as follows.
1. Two of its authors are also the authors of "AI scientist", https://arxiv.org/abs/2408.06292
These two papers are clearly a part of Jeff Clune's paradigm of "AI-generating algorithms", https://arxiv.org/abs/1905.10985 (currently 123 references on Google Scholar, but a number of its derivative works have higher citation counts).
Safety concerns were raised in the referenced twitter thread and are also discussed in the paper (Section 6, page 12). As usual, the dichotomy of whether to expose the relevant capability gains or whether to avoid exposing them is quite non-trivial, so one would expect differences of opinion here. The capability gains here are rather straightforward (one does not even use GPUs on the client side, this is straightforwardly based on the ability to do LLM inference via API).
2. The workflow "train an agent with a weak LLM, then substitute a stronger LLM after training, and the performance jumps" is very pronounced here.
In particular, see Section 4.3, page 9. They synthesized a few agents on one of the ARC datasets using GPT-3.5 as the underlying LLM[1], reaching the performance of 12-14%. Then they substituted GPT-4 and Claude 3.5 Sonnet, and the performance jumped respectively to 30-37% and 38-49% without any further adjustments[2].
One should expect further gains when better future LLMs are substituted here (without further adjustments of the agents).
The LLM used by generated agents during training and initial evaluation. The meta process controlling the generation of agents used
gpt-4o-2024-05-13
. ↩︎Those who want to look more closely at the generated agents will find the conversation in https://github.com/ShengranHu/ADAS/issues/4 helpful. ↩︎
An extra recent observation point: currently GPT-4o cost is $5.00 / 1M input tokens and $15.00 / 1M output tokens https://openai.com/api/pricing/
They just made an experimental "long output" up to 64K output tokens per request available for "alpha users", and here was what they did for pricing https://openai.com/gpt-4o-long-output/:
Long completions are more costly from an inference perspective, so the per-token pricing of this model is increased to match the costs.
$6.00 / 1M tokens and $18.00 / 1M tokens
There are notably inconsistent results in whether targeting ultrasound to a given brain area increases or decreases neural activity in that area, even in some cases when the same area is targeted with the same sonication parameters! We clearly need to get a better sense of what ultrasound even does.
A somewhat related: a member of technical staff of Prophetic AI made public a technical overview in July:
Assessing the Risks and Safety of Neuromodulatory Transcranial Focused Ultrasound (tFUS) (via https://x.com/PropheticAI/status/1814307410011443395).
It briefly discusses the history and competing theories of mechanisms of neuromodulation and mostly focuses on risks and safety, but it also has a diagram explaining "Activation vs. Inhibition across in-human tFUS studies" on page 15.
This diagram is taken from the chapter "Transcranial Ultrasound Stimulation" of "Handbook of Neuroengineering" (2023) (beyond a paywall: https://link.springer.com/referencework/10.1007/978-981-16-5540-1?page=4 and https://link.springer.com/referenceworkentry/10.1007/978-981-16-5540-1_56).
I don't know how firm this understanding is, whether this is just a "best guess".