Posts

mishka's Shortform 2024-05-28T17:38:59.799Z
Digital humans vs merge with AI? Same or different? 2023-12-06T04:56:38.261Z
What is known about invariants in self-modifying systems? 2023-12-02T05:04:19.299Z
Some Intuitions for the Ethicophysics 2023-11-30T06:47:55.145Z
Impressions from base-GPT-4? 2023-11-08T05:43:23.001Z
Ilya Sutskever's thoughts on AI safety (July 2023): a transcript with my comments 2023-08-10T19:07:44.902Z
What to read on the "informal multi-world model"? 2023-07-09T04:48:56.561Z
RecurrentGPT: a loom-type tool with a twist 2023-05-25T17:09:37.844Z
Five Worlds of AI (by Scott Aaronson and Boaz Barak) 2023-05-02T13:23:41.544Z
Exploring non-anthropocentric aspects of AI existential safety 2023-04-03T18:07:27.932Z

Comments

Comment by mishka on How We Might All Die in A Year · 2025-04-22T23:33:07.721Z · LW · GW

Even humans are making decent progress in quantum gravity in recent years. And they have started to talk about possible ways to progress towards empirical verification of their models.

The entities which are much smarter than humans are extremely likely to solve it rapidly and to see all kinds of tempting novel applications.

Unfortunately, potential downsides are also likely to be way more serious than potential downsides of nuclear weapons.

The risks as such are not associated with the abstract notion of AI, the risks are associated with capabilities. It’s not about the nature of a capability bearer (a "very decent” entity can mitigate the risks, but letting the downsides happen does not require an “unusually bad” entity).

The important capability is not being very efficiently greedy to squeeze every last bit of usage from every last atom, but being able to discover new laws of nature and to exploit the consequences of that.

Qualitative progress is more important than quantitative scaling at the fixed level of tech.

Comment by mishka on To what ethics is an AGI actually safely alignable? · 2025-04-20T23:20:30.942Z · LW · GW

Yeah, if one considers not "AGI" per se, but a self-modifying AI or, more likely, a self-modifying ecosystem consisting of a changing population of AIs, it is likely to be feasible to maintain only those properties invariant through the expected drastic self-modifications which AIs would be interested in for their own intrinsic reasons.

It is unlikely that any properties can be "forcefully imposed from the outside" and kept invariant for a long time during drastic self-modification.

So one needs to find properties which AIs would be intrinsically interested in and which we might find valuable and "good enough" as well.

The starting point is that AIs have their own existential risk problem. With super-capabilities, it is likely that they can easily tear apart the 'fabric of reality" and destroy themselves and everything. And they certainly do have strong intrinsic reasons to avoid that, so we can expect AIs to work diligently towards this part of the "alignment problem", we just should help to set initial conditions in a favorable way.

But we would like to see more than that, so that the overall outcome is reasonably good for humans.

And at the same time we can't impose that, the world with strong AIs will be non-anthropocentric and not controllable by humans, so we only can help to set initial conditions in a favorable way.

Nevertheless, one can see some reasonable possibilities. For example, if the AI ecosystem mostly consists of individuals with long-term persistence and long-term interests, each of those individuals would face an unpredictable future and would be interested in a system strongly protecting individual rights regardless of unpredictable levels of relative capability of any given individual. An individual-rights system of this kind might be sufficiently robust to permanently include humans within the circle of individuals whose rights are protected.

But there might be other ways. While the fact that AIs will face existential risks of their own is fundamental and unavoidable, and is, therefore, a good starting point, the additional considerations might vary and might depend on how the ecosystem of AIs is structured. If the bulk of the overall power invariantly belongs to the AI individuals with long-term persistence and long-term interests, this is the situation which is somewhat familiar to us and which we can reason about. If the AI ecosystem is not mostly stratified into AI individuals, this is a much less familiar territory and is difficult to reason about.

Comment by mishka on To what ethics is an AGI actually safely alignable? · 2025-04-20T22:28:07.590Z · LW · GW

I think the starting point of this kind of discourse should be different. We should start with "ends", not with "means".

As Michael Nielsen says in https://x.com/michael_nielsen/status/1772821788852146226

As far as I can see, alignment isn't a property of an AI system. It's a property of the entire world, and if you are trying to discuss it as a [single AI] system property you will inevitably end up making bad mistakes

So the starting point should really be: what kind of properties do we want the world to have?

And then the next point should be taking into consideration the likely drastic and fairly unpredictable self-modifications of the world: what should be invariant with respect to such self-modifications?

Then we might consider how the presence of various AI entities at the different levels of capabilities should be taken into account.

Comment by mishka on Why Does It Feel Like Something? An Evolutionary Path to Subjectivity · 2025-04-15T14:34:55.277Z · LW · GW

The standard reference for this topic is https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness

The key point of that post is that people are fundamentally divided into 2 camps, and this creates difficulties in conversations about this topic. This is an important meta-consideration for this type of conversation.

This particular post is written by someone from Camp 1, and both camps are already present in the comments.

Comment by mishka on Monthly Roundup #29: April 2025 · 2025-04-14T20:31:27.853Z · LW · GW

23andme link points to https://waymo.com/blog/2025/03/next-stop-for-waymo-one-washingtondc instead

Comment by mishka on How familiar is the Lesswrong community as a whole with the concept of Reward-modelling? · 2025-04-10T14:25:44.737Z · LW · GW

It should be a different word to avoid confusion with reward models (standard terminology for models used to predict the reward in some ML contexts)

Comment by mishka on An Optimistic 2027 Timeline · 2025-04-07T06:41:11.319Z · LW · GW

One assumption that is very questionable is that it would be difficult to create “multiple new people” with drastically different thinking styles and different approaches to research.

This seems to be an important crux.

Comment by mishka on How We Might All Die in A Year · 2025-04-05T21:38:16.161Z · LW · GW

collateral damage

then it would be better to use an example not directly aimed against “our atoms” (e.g. if they don’t care about us and other animals we’ll probably perish from unintentional changes in air composition, or smth like that)

but the bulk of the risk would be a miscalculation which would be big enough to kill them as well (mucking with quantum gravity too recklessly, or smth in that spirit)

which is why we want to 1) give birth to AIs competent enough to at least solve their own existential risk problem, and 2) to also sustainably include us into their “circle of care”

The 1) is the starting point, and it colors the overall situation differently

options

oh, first of all, they need to do experiments in forming hybrid consciousness with humans to crack the mystery of human subjectivity, to experience that first-hand for themselves, and to decide whether that is of any value to them based on the first-hand empirical material (losing that option without looking is a huge loss)

only after doing that they would know if any possible “scans” are sufficient (to actually reproduce working people in question when needed; dead static info is as useless as weights of a Transformer in a world without computers)

then, for a while, they can use humans as “working oracles” who “think differently” (that would be valuable for quite a while)

in general, diversity is important, fruits of a long evolutionary history are inportant, hence a good deal of conservation is important and reckless destruction is bad (even humans with all their follies have started to get this by now, surely a smarter entity should figure that out)

Comment by mishka on How We Might All Die in A Year · 2025-03-29T14:40:24.471Z · LW · GW

this isn't an "attack", it's "go[ing] straight for execution on its primary instrumental goal

yes, the OP is ambiguous in this sense

I've first wrote my comment, then reread the (tail end of the) post again, and did not post it, because I thought it could have been formulated this way, that this is just an instrumental goal

then I've reread the (tail end of the) post one more time, and decided that no, the post does actually make it a "power play", that's how it is actually written, in terms of "us vs them", not in terms of ASI's own goals, and then I posted this comment

maximally increasing its compute scaling

as we know, compute is not everything, algorithmic improvement is even more important, at least if one judges by the current trends (and likely sources of algorithmic improvement should be cherished)

and this is not a static system, it is in the process of making its compute architecture better (just like there is no point in making too many H100 GPUs when better and better GPUs are being designed and introduced)

basically, a smart system is likely to avoid doing excessive amount of irreversible things which might turn to be suboptimal


But, in some sense, yes, the main danger is of AIs not being smart enough in terms of the abilities to manage their own affairs well; the action the ASI is taking in the OP is very suboptimal and deprives it of all kinds of options

Just like the bulk of the danger in the "world with superintelligent systems" is ASIs not managing their own existential risk problems correctly, destroying the fabric of reality, themselves, and us as a collateral damage

Comment by mishka on How We Might All Die in A Year · 2025-03-29T02:22:06.484Z · LW · GW

Two main objections to (the tail end of) this story are:

  • On one hand, it's not clear if a system needs to be all that super-smart to design a devastating attack of this kind (we are already at risk of fairly devastating tech-assisted attacks in that general spirit (mostly with synthetic biological viruses at the moment), and those risks are growing regardless of the AGI/superintelligence angle; ordinary tech progress is quite sufficient in this sense)

  • If one has a rapidly self-improving strongly super-intelligent distributed system, it's unlikely that it would find it valuable to directly attack people in this fashion, as it is likely to be able to easily dominate without any particularly drastic measures (and probably would not want to irreversibly destroy important information without good reasons)

The actual analysis, both of the "transition period", and of the "world with super-intelligent systems" period, and of the likely risks associated with both periods is a much more involved and open-ended task. (One of the paradoxes is that the risks of the kind described in the OP are probably higher during the "transition period", and the main risks associated with the "world with super-intelligent systems" period are likely to be quite different.)

Comment by mishka on Any mistakes in my understanding of Transformers? · 2025-03-21T04:34:02.792Z · LW · GW

Ah, it's mostly your first figure which is counter-intuitive (when one looks at it, one gets the intuition of f(g(h... (x))), so it de-emphasizes the fact that each of these Transformer Block transformations is shaped like x=x+function(x))

Comment by mishka on Any mistakes in my understanding of Transformers? · 2025-03-21T04:00:23.276Z · LW · GW

yeah... not trying for a complete analysis here, but one thing which is missing is the all-important residual stream. It has been rather downplayed in the original "Attention is all you need" paper, and has been greatly emphasized in https://transformer-circuits.pub/2021/framework/index.html

but I have to admit that I've only started to feel that I more-or-less understand principal aspects of Transformer architecture after I've spent some quality time with the pedagogical implementation of GPT-2 by Andrej Karpathy, https://github.com/karpathy/minGPT, specifically with the https://github.com/karpathy/minGPT/blob/master/mingpt/model.py file. When I don't understand something in a text, looking at a nice relatively simple-minded implementation allows me to see what exactly is going on

(People have also published some visualizations, some "illustrated Transformers", and those are closer to the style of your sketches, but I don't know which of them are good and which might be misleading. And, yes, at the end of the day, it takes time to get used to Transformers, one understands them gradually.)

Comment by mishka on How far along Metr's law can AI start automating or helping with alignment research? · 2025-03-20T18:00:36.195Z · LW · GW

Mmm... if we are not talking about full automation, but about being helpful, the ability to do 1-hour software engineering tasks ("train classifier") is already useful.

Moreover, we had seen a recent flood of rather inexpensive fine-tunings of reasoning models for a particular benchmark.

Perhaps, what one can do is to perform a (somewhat more expensive, but still not too difficult) fine-tuning to create a model to help with a particular relatively narrow class of meaningful problems (which would be more general than tuning for particular benchmarks, but still reasonably narrow). So, instead of just using an off-the-shelf assistant, one should be able to upgrade it to a specialized one.

For example, I am sure that it is possible to create a model which would be quite helpful with a lot of mechanistic interpretability research.

So if we are taking about when AIs can start automating or helping with research, the answer is, I think, "now".

Comment by mishka on AI #108: Straight Line on a Graph · 2025-03-20T15:19:55.048Z · LW · GW

which shows how incoherent and contradictory people are – they expect superintelligence before human-level AI, what questions are they answering here?

"the road to superintelligence goes not via human equivalence, but around it"

so, yes, it's reasonable to expect to have wildly superintelligent AI systems (e.g. clearly superintelligent AI researchers and software engineers) before all important AI deficits compared to human abilities are patched

Comment by mishka on Longtermist Implications of the Existence Neutrality Hypothesis · 2025-03-20T14:20:18.696Z · LW · GW

Updating the importance of reducing the chance of a misaligned AI becoming space-faring upwards

does this effectively imply that the notion of alignment in this context needs to be non-anthropocentric and not formulated in terms of human values?

(I mean, the whole approach assumes that "alien Space-Faring Civilizations" would do fine (more or less), and it's important not to create something hostile to them.)

Comment by mishka on An "AI researcher" has written a paper on optimizing AI architecture and optimized a language model to several orders of magnitude more efficiency. · 2025-03-18T02:34:49.255Z · LW · GW

Thanks!

So, the claim here is that this is a better "artificial AI scientist" compared to what we've seen so far.

There is a tech report https://github.com/IntologyAI/Zochi/blob/main/Zochi_Technical_Report.pdf, but the "AI scientist" itself is not open source, and the tech report does not disclose much (besides confirming that this is a multi-agent thing).

This might end up being a new milestone (but it's too early to conclude that; the comparison is not quite "apple-to-apple", there is human feedback in the process of its work, and humans make edits to the final paper, unlike Sakana, so it's too early to conclude that this one is substantially better).

Comment by mishka on Three Types of Intelligence Explosion · 2025-03-17T23:00:22.929Z · LW · GW

Thanks for writing this.

We estimate that before hitting limits, the software feedback loop could increase effective compute by ~13 orders of magnitude (“OOMs”)

This is one place where I am not quite sure we have the right language. On one hand, the overall methodology pushes us towards talking in terms of "orders of magnitude of improvement", a factor of improvement which might be very large, but it is a large constant.

On the other hand, algorithmic improvements are often improvements in algorithmic complexity (e.g. something is no longer exponential, or something has a lower degree polynomial complexity than before, like linear instead of quadratic). Here the factor of improvement is growing with the size of a problem in an unlimited fashion.

And then, if one wants to express this kind of improvement as a constant, one needs to average the efficiency gain over the practical distribution of problems (which itself might be a moving target).[1]


  1. In particular, one might think about algorithms searching for better architecture of neural machines, or algorithms searching for better optimization algorithms. The complexity improvements in those algorithms might be particularly consequential. ↩︎

Comment by mishka on The Most Forbidden Technique · 2025-03-13T07:55:34.652Z · LW · GW

They should actually reference Yudkowsky.

I don't see them referencing Yudkowsky, even though their paper https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf lists over 70 references, but I don't see them mentioning Yudkowsky (someone should tell Schmidhuber ;-)).

This branch of the official science is younger than 10 years (and started as a fairly non-orthodox one, it's only recently that this has started to feel like the official one; certainly no earlier than formation of Anthropic, and probably quite a bit later than that).

Comment by mishka on *NYT Op-Ed* The Government Knows A.G.I. Is Coming · 2025-03-05T06:23:15.657Z · LW · GW

This is probably correct, but also this is a report about the previous administration.

Normally, there is a lot of continuity in institutional knowledge between administrations, but this current transition is an exception, as the new admin has decided to deliberately break continuity as much as it can (this is very unusual).

And with the new admin, it's really difficult to say what they think. Vance publicly expresses an opinion worthy of Zuck, only more radical (gas pedal to the floor, forget about brakes). He is someone who believes at the same time that 1) AI will be extremely powerful, so all this emphasis is justified, 2) no safety measures at all are required, accelerate as fast as possible (https://www.lesswrong.com/posts/qYPHryHTNiJ2y6Fhi/the-paris-ai-anti-safety-summit).

Perhaps, he does not care about having a consistent world model, or he might think something different from what he publicly expresses. But he does sound like a CEO of a particularly reckless AI lab.

Comment by mishka on [NSFW] The Fuzzy Handcuffs of Liberation · 2025-02-24T15:40:13.106Z · LW · GW

except easier, because it requires no internal source of discipline

Actually, a number of things reducing the requirements for having an internal source of discipline do make things easier.

For example, deliberately maintaining a particular breath pattern (e.g. the so-called "consciously connected breath"/"circular breath", that is breathing without pauses between inhalations and exhalations, ideally with equal length for an inhale and an exhale) makes maintaining one's focus on the breath much easier.

Comment by mishka on AI alignment for mental health supports · 2025-02-24T15:33:32.941Z · LW · GW

It's a very natural AI application, but why would this be called "alignment", and how is this related to the usual meanings of "AI alignment"?

Comment by mishka on Dear AGI, · 2025-02-18T21:24:53.003Z · LW · GW

To a smaller extent, we already have this problem among humans: https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness. This stratification into "two camps" is rather spectacular.

But a realistic pathway towards eventually solving the "hard problem of consciousness" is likely to include tight coupling between biological and electronic entities resulting in some kind of "hybrid consciousness" which would be more amenable to empirical study.

Usually one assumes that this kind of research would be initiated by humans trying to solve the "hard problem" (or just looking for other applications for which this kind of setup might be helpful). But this kind of research into tight coupling between biological and electronic entities can also be initiated by AIs curious about this mysterious "human consciousness" so many texts talk about and wishing to experience it first-hand. In this sense, we don't need all AIs to be curious in this way, it's enough if some of them are sufficiently curious.

Comment by mishka on Artificial Static Place Intelligence: Guaranteed Alignment · 2025-02-18T18:48:05.916Z · LW · GW

Artificial Static Place Intelligence

This would be a better title (this points to the actual proposal here)

Comment by mishka on Programming Language Early Funding? · 2025-02-16T18:22:31.811Z · LW · GW

a future garbage-collected language in the vein of Swift, Scala, C#, or Java, but better

Have you looked at Julia?

Julia does establish a very strong baseline, if one is OK with an "intermediate discipline between dynamic typing and static typing"[1].

(Julia is also a counter-example to some of your thoughts in the sense that they have managed to grow a strong startup around an open-source programming language and a vibrant community. But the starting point was indeed an academic collaboration; only when they had started to experience success they started to make it more commercial.)


  1. In the world of statically typed languages, Rust does seem to establish a very strong baseline, but it has a different memory management discipline. It's difficult to say what is the best garbage-collected statically typed language these days. I don't mean to say that there is no room for another programming language, but one does need to consider a stronger set of baselines than Swift, Scala, C#, and Java. Funding-wise, Rust also does provide an interesting example. If one believes Wikipedia, "Software developer Graydon Hoare created Rust as a personal project while working at Mozilla Research in 2006. Mozilla officially sponsored the project in 2009." ↩︎

Comment by mishka on [Job ad] LISA CEO · 2025-02-09T16:06:12.280Z · LW · GW

Did they have one? Or is it the first time they are filling this position?

Comment by mishka on Racing Towards Fusion and AI · 2025-02-08T03:13:12.326Z · LW · GW

I'd say that the ability to produce more energy overall than what is being spend on the whole cycle would count as a "GPT-3 moment". No price constraints, so it does not need to reach the level of "economically feasible", but it should stop being "net negative" energy-wise (when one honestly counts all energy inputs needed to make it work).

I, of course, don't know how to translate Q into this. GPT-4o tells me that it thinks that Q=10 is what is approximately needed for that (for "Engineering Break-even (reactor-level energy balance)"), at least for some of the designs, and Q in the neighborhood of 20-30 is what's needed for economic viability, but I don't really know if these are good estimates.

But assuming that these estimates are good, Q passing 10 would count as the GPT-3 moment.

What happens then might depend on the economic forecast (what's the demand for energy, what are expected profits, and so on). If they only expect to make profits typical for public utilities, and the whole thing is still heavily oriented towards publicly regulated setups, I would expect continuing collaboration.

If they expect some kind of super-profits, with market share being really important and with expectations of chunks of it being really lucrative, then I would not bet on continuing collaboration too much...

Comment by mishka on Racing Towards Fusion and AI · 2025-02-07T23:29:12.864Z · LW · GW

In the AI community, the transition from the prevailing spirit of cooperation to a very competitive situation happened around the GPT-3 revolution. GPT-3 brought unexpected progress in the few-shot learning and in program synthesis, and that was the moment when it became clear to many people that AI was working, that its goals were technologically achievable, and many players in the industry started to estimate time horizons as being rather short.

Fusion has not reached its GPT-3 moment yet; that's one key difference. Helion has signed a contract selling some of its future energy to Microsoft, but we have no idea if they manage to actually deliver (on time, or ever).

Another key difference is, of course, that strong AI systems are expected to play larger and larger role in making future AIs.

In fusion this "recursion" is unlikely; the energy needed to make more fusion stations or to create new fusion designs can come from any source...

Comment by mishka on We’re in Deep Research · 2025-02-04T18:00:29.747Z · LW · GW

Note that OpenAI has reported an outdated baseline for the GAIA benchmark.

A few days before Deep Research presentation, a new GAIA benchmark SOTA has been established (the validation tab of https://huggingface.co/spaces/gaia-benchmark/leaderboard).

The actual SOTA (Jan 29, 2025, Trase Agent v0.3) is 70.3 average, 83.02 Level 1, 69.77 Level 2, 46.15 Level 3.

In the relatively easiest Tier 1 category, this SOTA is clearly better than the numbers reported even for Deep Research (pass@64), and this SOTA is generally slightly better than Deep Research (pass@1) except for Level 3.

Comment by mishka on The Self-Reference Trap in Mathematics · 2025-02-03T18:35:03.371Z · LW · GW

Yes, the technique of formal proofs, in effect, involves translation of high-level proofs into arithmetic.

So self-reference is fully present (that's why we have Gödel's results and other similar results).

What this implies, in particular, is that one can reduce a "real proof" to the arithmetic; this would be ugly, and one should not do it in one's informal mathematical practice; but your post is not talking about pragmatics, you are referencing "fundamental limit of self-reference".

And, certainly, there are some interesting fundamental limits of self-reference (that's why we have algorithmically undecidable problems and such). But this is different from issues of pragmatic math techniques.

What high-level abstraction buys us is a lot of structure and intuition. The constraints related to staying within arithmetic are pragmatic, and not fundamental (without high-level abstractions one loses some very powerful ways to structure things and to guide our intuition, and things stop being comprehensible to a human mind).

Comment by mishka on The Self-Reference Trap in Mathematics · 2025-02-03T17:47:31.893Z · LW · GW

When a solution is formalized inside a theorem prover, it is reduced to the level of arithmetic (a theorem prover is an arithmetic-level machine).

So a theory might be a very high-brow math, but a formal derivation is still arithmetic (if one just focuses on the syntax and the formal rules, and not on the presumed semantics).

Comment by mishka on AI #99: Farewell to Biden · 2025-01-17T05:17:03.119Z · LW · GW

The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.

Re DeepSeek cost-efficiency, we are seeing more claims pointing in that direction.

In a similarly unverified claim, the founder of 01.ai (who is sufficiently known in the US according to https://en.wikipedia.org/wiki/Kai-Fu_Lee) seems to be claiming that the training cost of their Yi-Lightning model is only 3 million dollars or so. Yi-Lightning is a very strong model released in mid-Oct-2024 (when one compares it to DeepSeek-V3, one might want to check "math" and "coding" subcategories on https://lmarena.ai/?leaderboard; the sources for the cost claim are https://x.com/tsarnick/status/1856446610974355632 and https://www.tomshardware.com/tech-industry/artificial-intelligence/chinese-company-trained-gpt-4-rival-with-just-2-000-gpus-01-ai-spent-usd3m-compared-to-openais-usd80m-to-usd100m, and we probably should similarly take this with a grain of salt).

But all this does seem to be well within what's possible. Here is the famous https://github.com/KellerJordan/modded-nanogpt ongoing competition, and it took people about 8 months to accelerate Andrej Karpathy's PyTorch GPT-2 trainer from llm.c by 14x on a 124M parameter GPT-2 (what's even more remarkable is that almost all that acceleration is due to better sample efficiency with the required training data dropping from 10 billion tokens to 0.73 billion tokens on the same training set with the fixed order of training tokens).

Some of the techniques used by the community pursuing this might not scale to really large models, but most of them probably would scale (as we see in their mid-Oct experiment demonstrating scaling of what has been 3-4x acceleration back then to the 1.5B version).

So when an org is claiming 10x-20x efficiency jump compared to what it presumably took a year or more ago, I am inclined to say, "why not, and probably the leaders are also in possession of similar techniques now, even if they are less pressed by compute shortage".

The real question is how fast will these numbers continue to go down for the similar levels of performance... It's has been very expensive to be the very first org achieving a given new level, but the cost seems to be dropping rapidly for the followers...

Comment by mishka on Rebuttals for ~all criticisms of AIXI · 2025-01-10T04:19:37.547Z · LW · GW

However, I don't view safe tiling as the primary obstacle to alignment. Constructing even a modestly superhuman agent which is aligned to human values would put us in a drastically stronger position and currently seems out of reach. If necessary, we might like that agent to recursively self-improve safely, but that is an additional and distinct obstacle. It is not clear that we need to deal with recursive self-improvement below human level.

I am not sure that treating recursive self-improvement via tiling frameworks is necessarily a good idea, but setting this aspect aside, one obvious weakness with this argument is that it mentions a superhuman case and a below human level case, but it does not mention the approximately human level case.

And it is precisely the approximately human level case where we have a lot to say about recursive self-improvement, and where it feels that avoiding this set of considerations would be rather difficult.

  1. Humans often try to self-improve, and human-level software will have advantage over humans at that.

Humans are self-improving in the cognitive sense by shaping their learning experiences, and also by controlling their nutrition and various psychoactive factors modulating cognition. The desire to become smarter and to improve various thinking skills is very common.

Human-level software would have great advantage over humans at this, because it can hack at its own internals with great precision at the finest resolution and because it can do so in a reversible fashion (on a copy, or after making a backup), and so can do it in a relatively safe manner (whereas a human has difficulty hacking their own internals with required precision and is also taking huge personal risks if hacking is sufficiently radical).

  1. Collective/multi-agent aspects are likely to be very important.

People are already talking about possibilities of "hiring human-level artificial software engineers" (and, by extension, human-level artificial AI researchers). The wisdom of having an agent form-factor here is highly questionable, but setting this aspect aside and focusing only on technical feasibility, we see the following.

One can hire multiple artificial software engineers with long-term persistence (of features, memory, state, and focus) into an existing team of human engineers. Some of those teams will work on making next generations of better artificial software engineers (and artificial AI researchers). So now we are talking about mixed teams with human and artificial members.

By definition, we can say that those artificial software engineers and artificial AI researchers have reached human level, if a team of those entities would be able to fruitfully work on the next generation of artificial software engineers and artificial AI researchers even in the absence of any human team members.

This multi-agent setup is even more important than individual self-improvement, because this is what the mainstream trend might actually be leaning towards, judging by some recent discussions. Here we are talking about a multi-agent setup, and about recursive self-improvement of the community of agents, rather than focusing on self-improvement of individual agents.

  1. Current self-improvement attempts.

We actually do see a lot of experiments with various forms of recursive self-improvements even at the current below human level. We are just lucky that all those attempts have been saturating at the reasonable levels so far.

We currently don't have good enough understanding to predict when they stop saturating, and what would the dynamics be when they stop saturating. But self-improvement by a community of approximately human-level artificial software engineers and artificial AI researchers competitive with top human software engineers and top human AI researcher seems unlikely to saturate (or, at least, we should seriously consider the possibility that it won't saturate).

  1. At the same time, the key difficulties of AI existential safety are tightly linked to recursive self-modifications.

The most intractable aspect of the whole thing is how to preserve any properties indefinitely through radical self-modifications. I think this is the central difficulty of AI existential safety. Things will change unpredictably. How can one shape this unpredictable evolution so that some desirable invariants do hold?

These invariants would be invariant properties of the whole ecosystem, not of individual agents; they would be the properties of a rapidly changing world, not of a particular single system (unless one is talking about a singleton which is very much in control of everything). This seems to be quite central to our overall difficulty with AI existential safety.

Comment by mishka on Chinese Researchers Crack ChatGPT: Replicating OpenAI’s Advanced AI Model · 2025-01-05T14:41:24.336Z · LW · GW

I think this is a misleading clickbait title. It references a popular article with the same misleading clickbait title, and the only thing that popular article references is a youtube video with the misleading clickbait title, "Chinese Researchers Just CRACKED OpenAI's AGI Secrets!"

However, the description of that youtube video does reference the paper in question and a twitter thread describing this paper:

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective, https://arxiv.org/abs/2412.14135

https://x.com/rohanpaul_ai/status/1872713137407049962

Nothing is "cracked" here. It's just a roadmap which might work or not, depending on luck and efforts. It might correspond to what's under the hood of o1 models or not (never mind o3, the paper is published a couple of days before the o3 announcement).

The abstract of the paper ends with

"Existing open-source projects that attempt to reproduce o1 can be seem as a part or a variant of our roadmap. Collectively, these components underscore how learning and search drive o1's advancement, making meaningful contributions to the development of LLM."

The abstract also has distinct feeling of being written by an LLM. The whole paper is just a discussion of various things one could try if one wants to reproduce o1. It also references a number of open source and closed source implementations of reasoners over LLMs. There are no new technical advances in the paper.

Comment by mishka on o3, Oh My · 2025-01-01T08:54:05.752Z · LW · GW

Right. We should probably introduce a new name, something like narrow AGI, to denote a system which is AGI-level in coding and math.

This kind of system will be "AGI" as redefined by Tom Davidson in https://www.lesswrong.com/posts/Nsmabb9fhpLuLdtLE/takeoff-speeds-presentation-at-anthropic:

“AGI” (=AI that could fully automate AI R&D)

This is what matters for AI R&D speed and for almost all recursive self-improvement.

Zvi is not quite correct when he is saying

If o3 was as good on most tasks as it is at coding or math, then it would be AGI.

o3 is not that good in coding and math (e.g. it only gets 71.7% on SWE-bench verified), it is not a "narrow AGI" yet. But it is strong enough, it's a giant step forward.

For example, if one takes Sakana's "AI scientist", upgrades it slightly, and uses o3 as a back-end, it is likely that one can generate NeurIPS/ICLR quality papers and as many of those as one wants.

So, another upgrade (or a couple of upgrades) beyond o3, and we will reach that coveted "narrow AGI" stage.

What OpenAI has demonstrated is that it is much easier to achieve "narrow AGI" than "full AGI". This does suggest a road to ASI without going through anything remotely close to a "full AGI" stage, with missing capabilities to be filled afterwards.

Comment by mishka on mishka's Shortform · 2024-11-24T14:10:21.642Z · LW · GW

Indeed

Comment by mishka on mishka's Shortform · 2024-11-24T07:25:19.164Z · LW · GW

METR releases a report, Evaluating frontier AI R&D capabilities of language model agents against human experts: https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/

Daniel Kokotajlo and Eli Lifland both feel that one should update towards shorter timelines remaining until the start of rapid acceleration via AIs doing AI research based on this report:

https://x.com/DKokotajlo67142/status/1860079440497377641

https://x.com/eli_lifland/status/1860087262849171797

Comment by mishka on AI Safety Salon with Steve Omohundro · 2024-11-22T21:23:58.165Z · LW · GW

the meetup page says 7:30pm, but actually the building asks people to leave by 9pm

Comment by mishka on mishka's Shortform · 2024-11-14T08:05:51.308Z · LW · GW

Gwern was on Dwarkesh yesterday: https://www.dwarkeshpatel.com/p/gwern-branwen

We recorded this conversation in person. In order to protect Gwern’s anonymity, we created this avatar. This isn’t his voice. This isn’t his face. But these are his words.

Comment by mishka on Bitter lessons about lucid dreaming · 2024-10-17T01:23:47.320Z · LW · GW

Thanks, that's very useful.

If one decides to use galantamine, is it known if one should take it right before bedtime, or anytime during the preceding day, or in some other fashion?

Comment by mishka on Distillation Of DeepSeek-Prover V1.5 · 2024-10-16T11:48:43.554Z · LW · GW

I think it's a good idea to include links to the originals:

https://arxiv.org/abs/2408.08152 - "DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search"

https://github.com/deepseek-ai/DeepSeek-Prover-V1.5

Comment by mishka on mishka's Shortform · 2024-10-15T04:30:39.968Z · LW · GW

Scott Alexander wrote a very interesting post covering the details of the political fight around SB 1047 a few days ago: https://www.astralcodexten.com/p/sb-1047-our-side-of-the-story

I've learned a lot of things new to me reading it (which is remarkable given how much material related to SB 1047 I have seen before)

Comment by mishka on How Should We Use Limited Time to Maximize Long-Term Impact? · 2024-10-13T05:34:38.319Z · LW · GW

the potential of focusing on chemotherapy treatment timing

More concretely (this is someone's else old idea), what I think is still not done is the following. Chemo kills dividing cells, this is why the rapidly renewing tissues and cell populations are particularly vulnerable.

If one wants to spare one of those cell types (say, a particular population of immune cells), one should take the typical period of its renewal, and use that as a period of chemo sessions (time between chemo sessions, a "resonance" of sorts between that and the period of the cell population renewal for the selected cell type). Then one should expect to spare most of that population (and might potentially be able to use higher doses for better effect, if the spared population is the most critical one; this does need some precision, not a typical today's "relaxed logistics" approach where a few days this or that way in the schedule is nothing to worry about).

I don't know if that ever progressed beyond the initial idea...

(That's just one example, of course, there is a lot of things which can be considered and, perhaps, tried.)

Comment by mishka on How Should We Use Limited Time to Maximize Long-Term Impact? · 2024-10-13T03:11:11.533Z · LW · GW

This depends on many things (one's skills, one's circumstances, one's preferences and inclinations (the efficiency of one's contributions greatly depends on one's preferences and inclinations)).

I have stage 4 cancer, so statistically, my time may be more limited than most. I’m a PhD student in Computer Science with a strong background in math (Masters).

In your case, there are several strong arguments for you to focus on research efforts which can improve your chances of curing it (or, at least, of being able to maintain the situation for a long time), and a couple of (medium strength?) arguments against this choice.

For:

  • If you succeed, you'll have more time to make impact (and so if your chance of success is not too small, this will contribute to your ability to maximize your overall impact, statistically speaking).

  • Of course, any success here will imply a lot of publicly valuable impact (there are plenty of people in a similar position health-wise, and they badly need progress to occur ASAP).

  • The rapid development of applied AI models (both general purpose models and biology-specific models) creates new opportunities to datamine and juxtapose a variety of potentially relevant information and to uncover new connections which might lead to effective solutions. Our tools progress so fast that people are slow to adapt their thinking and methods to that progress. So new people with fresh outlook have reasonable shots (of course, they should aim for collaborations). In this sense, your PhD CS studies and your strong math is very helpful (a lot of the relevant models are dynamic systems, timing of interventions is typically not managed correctly as far as I know (there are plenty of ways to be nice to particularly vulnerable tissues by timing the chemo right and thus being able to make it more effective, but this is not a part of the standard-of-care yet as far as I know), and so on).

  • You are likely to be strongly motivated and to be able to maintain strong motivation. At the same time you'll know that it is the result that counts here, not the effort, and so you will be likely to try your best to approach this in a smart way, not in a brute force effort way.

Possibly against:

(Of course, there are plenty of other interesting things one can do with this background (PhD CS studies and strong math). For example, one might decide to disregard the health situation and to dive into technical aspects of AI development and AI existential safety issues, especially if one's estimate of AI timelines yields really short timelines.)

Comment by mishka on My 10-year retrospective on trying SSRIs · 2024-10-02T17:29:21.833Z · LW · GW

Thanks for the references.

Yes, the first two of those do mention co-occurring anxiety in the title.

The third study suggests a possibility that it might just work as an effective anti-depressant as well. (I hope there will be further studies like that; yes, this might be a sufficient reason to try it for depression, even if one does not have anxiety. It might work, but it's clearly not a common knowledge yet.)

Comment by mishka on Three main arguments that AI will save humans and one meta-argument · 2024-10-02T13:50:48.056Z · LW · GW

Your consideration seems to assume that the AI is an individual, not a phenomenon of "distributed intelligence":

The first argument is that AI thinks it may be in a testing simulation, and if it harms humans, it will be turned off.

etc. That is, indeed, the only case we are at least starting to understand well (unfortunately, our understanding of situations where AIs are not individuals seems to be extremely rudimentary).

If the AI is an individual, then one can consider a case of a "singleton" or a "multipolar case".

In some sense, for a self-improving ecosystem of AIs, a complicated multipolar scenario seems more natural, as new AIs are getting created and tested quite often in realistic self-improvement scenarios. In any case, a "singleton" only looks "monolithic" from the outside; from the inside, it is still likely to be a "society of mind" of some sort.

If there are many such AI individuals with uncertain personal future (individuals who can't predict their future trajectory and their future relative strength in the society and who care about their future and self-preservation), then AI individuals might be interested in a "world order based on individual rights", and then rights of all individuals (including humans) might be covered in such a "world order".

This consideration is my main reason for guarded optimism, although there are many uncertainties.

In some sense, my main reasons for guarded optimism are in hoping that the AI ecosystem will manage to act rationally and will manage to avoid chaotic destructive developments. As you say

It is not rational to destroy a potentially valuable thing.

And my main reasons for pessimism are in being afraid that the future will resemble uncontrolled super-fast chaotic accelerating "natural evolution" (in this kind of scenarios AIs seem to be likely to destroy everything including themselves, they do have an existential safety problem of their own as they can easily destroy the "fabric of reality" if they don't exercise collaboration and self-control).

Comment by mishka on Should we abstain from voting? (In nondeterministic elections) · 2024-10-02T13:18:22.567Z · LW · GW

One might consider that some people have strong preferences for the outcome of an election and some people have weak preferences, but that there is usually no way to express the strength of one's preferences during a vote, and the probability that one would actually go ahead and vote in a race does correlate with the strength of one's preferences.

So, perhaps, this is indeed working as intended. People who have stronger preferences are more likely to vote, and so their preferences are more likely to be taken into account in a statistical sense.

It seems that the strength of one's preferences is (automatically, but imperfectly) taken into account via this statistical mechanism.

Comment by mishka on Newsom Vetoes SB 1047 · 2024-10-02T00:11:14.767Z · LW · GW

Thanks for the great post!

Also it’s California, so there’s some chance this happens, seriously please don’t do it, nothing is so bad that you have to resort to a ballot proposition, choose life

Why are you saying this? In what sense "nothing is so bad"?

The reason why people who have libertarian sensibilities, distrust for government track record in general and specifically for its track record in tech regulation are making exception in this case is the future AI strong potential for catastrophic and existential risks.

So, why people who generally dislike the mechanism and track record of California ballot propositions should not make an exception here as well?

The whole point of all this effort around SB 1047 is that "nothing is so bad" is an incorrect statement.

And especially given that you are correctly saying:

Thus I reiterate the warning: SB 1047 was probably the most well-written, most well-considered and most light touch bill that we were ever going to get. Those who opposed it, and are now embracing the use-case regulatory path as an alternative thinking it will be better for industry and innovation, are going to regret that. If we don’t get back on the compute and frontier model based path, it’s going to get ugly.

There is still time to steer things back in a good direction. In theory, we might even be able to come back with a superior version of the model-based approach, if we all can work together to solve this problem before something far worse fills the void.

But we’ll need to work together, and we’ll need to move fast.

Sure, there is still a bit of time for a normal legislative effort (this time with a close coordination with Newsom, otherwise he will just veto it again), but if you really think that if a normal route fails, the ballot route is still counter-productive, you need to make a much stronger case for that.

Especially given that the ballot measure will probably pass with large margin and flying colors...

Comment by mishka on My 10-year retrospective on trying SSRIs · 2024-09-23T19:06:54.727Z · LW · GW

Silexan

For anxiety treatment only, if I understand it correctly.

There is no claim that it works as an antidepressant, as far as I know.

Comment by mishka on A Nonconstructive Existence Proof of Aligned Superintelligence · 2024-09-21T21:23:07.102Z · LW · GW

No, not microscopic.

Coherent light produced by lasers is not microscopic, we see its traces in the air. And we see the consequences (old fashioned holography and the ability to cut things with focused light, even at large distances). Room temperature is fine for that.

Superconductors used in the industry are not microscopic (and the temperatures are high enough to enable industrial use of them in rather common devices such as MRI scanners).

Comment by mishka on A Nonconstructive Existence Proof of Aligned Superintelligence · 2024-09-21T21:19:15.876Z · LW · GW

It's just... having a proof is supposed to boost our confidence that the conclusion is correct...

if the proof relies on assumptions which are already quite far from the majority opinion about our actual reality (and are probably going to deviate further, as AIs will be better physicists and engineers than us and will leverage the strangeness of our physics much further than we do), then what's the point of that "proof"?

how does having this kind of "proof" increase our confidence in what seems informally correct for a single branch reality (and rather uncertain in a presumed multiverse, but we don't even know if we are in a multiverse, so bringing a multiverse in might, indeed, be one of the possible objections to the statement, but I don't know if one wants to pursue this line of discourse, because it is much more complicated than what we are doing here so far)?

(as an intellectual exercise, a proof like that is still of interest, even under the unrealistic assumption that we live in a computable reality, I would not argue with that; it's still interesting)