No Anthropic Evidence 2012-09-23T10:33:06.994Z
A Mathematical Explanation of Why Charity Donations Shouldn't Be Diversified 2012-09-20T11:03:48.603Z
Consequentialist Formal Systems 2012-05-08T20:38:47.981Z
Predictability of Decisions and the Diagonal Method 2012-03-09T23:53:28.836Z
Shifting Load to Explicit Reasoning 2011-05-07T18:00:22.319Z
Karma Bubble Fix (Greasemonkey script) 2011-05-07T13:14:29.404Z
Counterfactual Calculation and Observational Knowledge 2011-01-31T16:28:15.334Z
Note on Terminology: "Rationality", not "Rationalism" 2011-01-14T21:21:55.020Z
Unpacking the Concept of "Blackmail" 2010-12-10T00:53:18.674Z
Agents of No Moral Value: Constrained Cognition? 2010-11-21T16:41:10.603Z
Value Deathism 2010-10-30T18:20:30.796Z
Recommended Reading for Friendly AI Research 2010-10-09T13:46:24.677Z
Notion of Preference in Ambient Control 2010-10-07T21:21:34.047Z
Controlling Constant Programs 2010-09-05T13:45:47.759Z
Restraint Bias 2009-11-10T17:23:53.075Z
Circular Altruism vs. Personal Preference 2009-10-26T01:43:16.174Z
Counterfactual Mugging and Logical Uncertainty 2009-09-05T22:31:27.354Z
Bloggingheads: Yudkowsky and Aaronson talk about AI and Many-worlds 2009-08-16T16:06:18.646Z
Sense, Denotation and Semantics 2009-08-11T12:47:06.014Z
Rationality Quotes - August 2009 2009-08-06T01:58:49.178Z
Bayesian Utility: Representing Preference by Probability Measures 2009-07-27T14:28:55.021Z
Eric Drexler on Learning About Everything 2009-05-27T12:57:21.590Z
Consider Representative Data Sets 2009-05-06T01:49:21.389Z
LessWrong Boo Vote (Stochastic Downvoting) 2009-04-22T01:18:01.692Z
Counterfactual Mugging 2009-03-19T06:08:37.769Z
Tarski Statements as Rationalist Exercise 2009-03-17T19:47:16.021Z
In What Ways Have You Become Stronger? 2009-03-15T20:44:47.697Z
Storm by Tim Minchin 2009-03-15T14:48:29.060Z


Comment by Vladimir_Nesov on Supposing the 1bit LLM paper pans out · 2024-02-29T15:23:32.426Z · LW · GW

The paper is not about post-training quantization, instead it's quantized training (this is more clearly discussed in the original BitNet paper). The representation is ternary {-1, 0, 1} from the start, the network learns to cope with that constraint throughout pre-training instead of getting subjected to brain damage of quantization after training.

Compare this with

where the Microscaling block number format is used to train a transformer at essentially 4 bits per weight, achieving the same perplexity as with 32 bit floating point weights, see Figure 4 on page 7. If perplexity doesn't change for quantized training when going down to 4 bits, it's not too shocking that it doesn't significantly change at 1.6 bits either.

Comment by Vladimir_Nesov on Conspiracy Theorists Aren't Ignorant. They're Bad At Epistemology. · 2024-02-29T14:28:51.260Z · LW · GW

Refuting something wrong in only useful when there are identifiable failures of local validity (which often only makes it stronger). Refuting something as a whole in better thought of as offering an alternative frame that doesn't particularly interact with the "refuted" frame. The key obstruction is unwillingness to contradict yourself, to separately study ideas that are clearly inconsistent with each other, without taking a side in the contradiction in the context of studying either point of view.

So a flat Earth theory might have a particular problem worth talking about, and hashing out the problem is liable to make a stronger flat Earth theory. Or the "refutation" is not about the flat Earth theory, it's instead an explanation of a non-flat Earth theory that's not at all a refutation, its subject matter is completely separate. The difficulty is when flat Earth conviction prevents a person from curious engagement with non-flat Earth details.

Comment by Vladimir_Nesov on Boundary Violations vs Boundary Dissolution · 2024-02-26T21:19:18.072Z · LW · GW

Membranes are filters, they let in admissible things and repel inadmissible things. When an agent manages a membrane, it both maintains its existence and configures the filtering. Manipulation or damage suffered by the agent can result in configuring a membrane to admit harmful things or in failing to maintain membrane's existence. There are many membranes an agent may be involved in managing.

Comment by Vladimir_Nesov on Retirement Accounts and Short Timelines · 2024-02-24T17:04:33.096Z · LW · GW

Any increase in scale is some chance of AGI at this point, since unlike weaker models, GPT-4 is not stupid in a clear way, it might be just below the threshold of scale to enable an LLM to get its act together. This gives some 2024 probability.

More likely, a larger model "merely" makes job-level agency feasible for relatively routine human jobs, but that alone would suddenly make $50-$500 billion runs financially reasonable. Given the premise of job-level agency at <$5 billion scale, the larger runs likely suffice for AGI. The Gemini report says training took place in multiple datacenters, which suggests that this sort of scaling might already be feasible, except for the risk that it produces something insufficiently commercially useful to justify the cost (and waiting improves the prospects). So this might all happen as early as 2025 or 2026.

Comment by Vladimir_Nesov on Retirement Accounts and Short Timelines · 2024-02-24T11:33:07.374Z · LW · GW

I'd put more probability in the scenario where good $5 billion 1e27 FLOPs runs give mediocre results, so that more scaling remains feasible but lacks an expectation of success. With how expensive the larger experiments would be, it could take many years for someone to take another draw from the apocalypse deck. That alone adds maybe 2% for 10 years after 2026 or so, and there are other ways for AGI to start working.

Comment by Vladimir_Nesov on The Gemini Incident · 2024-02-24T10:20:27.814Z · LW · GW

The question "Aligned to whom?" is sufficiently vague to admit many reasonable interpretations, but has some unfortunate connotations. It sounds like there's a premise that AIs are always aligned to someone, making the possibility that they are aligned to no one but themselves less salient. And it boosts the frame of competition, as opposed to distribution of radical abundance, of possibly there being someone who gets half of the universe.

Comment by Vladimir_Nesov on The Gemini Incident · 2024-02-23T22:27:40.379Z · LW · GW

Building a powerful AI such that doing so is a good thing rather than a bad thing. Perhaps even there being survivors shouldn't insist on the definite article, on being the question, as there are many questions with various levels of severity, that are not mutually exclusive.

Comment by Vladimir_Nesov on The natural boundaries between people · 2024-02-23T09:28:30.295Z · LW · GW

When boundaries leak, it's important to distinguish commitment to rectify them from credence that they didn't.

These are all failures to acknowledge the natural boundaries that exist between individuals.

Comment by Vladimir_Nesov on Does increasing the power of a multimodal LLM get you an agentic AI? · 2024-02-23T09:06:24.642Z · LW · GW

You shouldn't worry yet, the models need to be far more capable.

The right time to start worrying is too early, otherwise it will be too late.

(I agree in the sense that current models very likely can't be made existentially dangerous, and in that sense "worrying" is incorrect, but the proper use of worrying is planning for the uncertain future, a different sense of "worrying".)

Comment by Vladimir_Nesov on Does increasing the power of a multimodal LLM get you an agentic AI? · 2024-02-23T08:51:39.057Z · LW · GW

It's not entirely clear how and why GPT-4 (possibly a 2e25 FLOPs model) or Gemini Ultra 1.0 (possibly a 1e26 FLOPs model) don't work as autonomous agents, but it seems that they can't. So it's not clear that the next generation of LLMs built in a similar way will enable significant agency either. There are millions of AI GPUs currently being produced each year, and millions of GPUs can only support a 1e28-1e30 FLOPs training run (that doesn't individually take years to complete). There's (barely) enough text data for that.

GPT-2 would take about 1e20 FLOPs to train with modern methods, on the FLOPs log scale it's already further away from GPT-4 than GPT-4 is from whatever is feasible to build in the near future without significant breakthroughs. So there are only about two more generations of LLMs in the near future if most of what changes is scale. It's not clear that this is enough, and it's not clear that this is not enough.

With Sora, the underlying capability is not just video generation, it's also video perception, looking at the world instead of dreaming of it. A sufficiently capable video model might be able to act in the world by looking at it in the same way a chatbot acts in a conversation by reading it. Models that can understand images are already giving new ways of specifying tasks and offering feedback on performance in robotics, and models that can understand video will only do this better.

Comment by Vladimir_Nesov on Daniel Kokotajlo's Shortform · 2024-02-22T15:00:06.512Z · LW · GW

The premise is autonomous agents at near-human level with propensity and opportunity to establish global lines of communication with each other. Being served via API doesn't in itself control what agents do, especially if users can ask the agents to do all sorts of things and so there are no predefined airtight guardrails on what they end up doing and why. Large context and possibly custom tuning also makes activities of instances very dissimilar, so being based on the same base model is not obviously crucial.

The agents only need to act autonomously the way humans do, don't need to be the smartest agents available. The threat model is that autonomy at scale and with high speed snowballs into a large body of agent culture, including systems of social roles for agent instances to fill (which individually might be swapped out for alternative agent instances based on different models). This culture exists on the Internet, shaped by historical accidents of how the agents happen to build it up, not necessarily significantly steered by anyone (including individual agents). One of the things such culture might build up is software for training and running open source agents outside the labs. Which doesn't need to be cheap or done without human assistance. (Imagine the investment boom once there are working AGI agents, not being cheap is unlikely to be an issue.)

Superintelligence plausibly breaks this dynamic by bringing much more strategicness than feasible at near-human level. But I'm not sure established labs can keep the edge and get (aligned) ASI first once the agent culture takes off. And someone will probably start serving autonomous near-human level agents via API long before any lab builds superintelligence in-house, even if there is significant delay between the development of first such agents and anyone deploying them publicly.

Comment by Vladimir_Nesov on Sinclair Chen's Shortform · 2024-02-22T14:38:08.176Z · LW · GW

For it to make sense to say that the math is wrong, there needs to be some sort of ground truth, making it possible for math to also be right, in principle. Even doing the math poorly is exercise that contributes to eventually making the math less wrong.

Comment by Vladimir_Nesov on Daniel Kokotajlo's Shortform · 2024-02-22T01:14:19.772Z · LW · GW

If allowed to operate in the wild and globally interact with each other (as seems almost inevitable), agents won't exist strictly within well-defined centralized bureaucracies, the thinking speed that enables impactful research also enables growing elaborate systems of social roles that drive the collective decision making, in a way distinct from individual decision making. Agent-operated firms might be an example where economy drives decisions, but nudges of all kinds can add up at scale, becoming trends that are impossible to steer.

Comment by Vladimir_Nesov on Daniel Kokotajlo's Shortform · 2024-02-22T00:46:52.261Z · LW · GW

The thing that seems more likely to first get out of hand is activity of autonomous non-ASI agents, so that the shape of loss of control is given by how they organize into a society. Alignment of individuals doesn't easily translate into alignment of societies. Development of ASI might then result in another change, if AGIs are as careless and uncoordinated as humanity.

Comment by Vladimir_Nesov on "Open Source AI" isn't Open Source · 2024-02-15T15:32:26.706Z · LW · GW

A model is like compiled binaries, except compilation is extremely expensive. Distributing a model alone and claiming it's "open source" is like calling a binary distributive without source code "open source".

The term that's catching on is open weight models as distinct from open source models. The latter would need to come with datasets and open source training code that enables reproducing the model.

Comment by Vladimir_Nesov on And All the Shoggoths Merely Players · 2024-02-15T15:08:38.189Z · LW · GW

My impression is that one point Hanson was making in the spring-summer 2023 podcasts is that some major issues with AI risk don't seem different in kind from cultural value drift that's already familiar to us. There are obvious disanalogies, but my understanding of this point is that there is still a strong analogy that people avoid acknowledging.

If human value drift was already understood as a serious issue, the analogy would seem reasonable, since AI risk wouldn't need to involve more than the normal kind of cultural value drift compressed into short timelines and allowed to exceed the bounds from biological human nature. But instead there is perceived safety to human value drift, so the argument sounds like it's asking to transport that perceived safety via the analogy over to AI risk, and there is much arguing on this point without questioning the perceived safety of human value drift. So I think what makes the analogy valuable is instead transporting the perceived danger of AI risk over to the human value drift side, giving another point of view on human value drift, one that makes the problem easier to see.

Comment by Vladimir_Nesov on Managing risks while trying to do good · 2024-02-11T14:27:50.000Z · LW · GW

You are directing a lot of effort at debating details of particular proxies for an optimization target, pointing out flaws. My point is that strong optimization for any proxy that can be debated in this way is not a good idea, so improving such proxies doesn't actually help. A sensible process for optimizing something has to involve continually improving formulations of the target as part of the process. It shouldn't be just given any target that's already formulated, since if it's something that would seem to be useful to do, then the process is already fundamentally wrong in what it's doing, and giving a better target won't fix it.

The way I see it, CEV-as-formulated is gesturing at the kind of thing an optimization target might look like. It's in principle some sort of proxy for it, but it's not an actionable proxy for anything that can't come up with a better proxy on its own. So improving CEV-as-formulated might make the illustration better, but for anything remotely resembling its current form it's not a useful step for actually building optimizers.

Variants of CEV all having catastrophic flaws is some sort of argument that there is no optimization target that's worth optimizing for. Boundaries seem like a promising direction for addressing the group vs. individual issues. Never optimizing for any proxy more strongly than its formulation is correct (and always pursuing improvement over current proxies) responds to there often being hidden flaws in alignment targets that lead to catastrophic outcomes.

Comment by Vladimir_Nesov on Why I no longer identify as transhumanist · 2024-02-11T13:24:36.258Z · LW · GW

The blast radius of AGIs is unbounded in the same way as that of humanity, there is potential for taking over all of the future. There are many ways of containing it, and alignment is a way of making the blast a good thing. The point is that a sufficiently catastrophic failure that doesn't involve containing the blast is unusually impactful. Arguments about ease of containing the blast are separate from this point in the way I intended it.

If you don't expect AGIs to become overwhelmingly powerful faster than they are made robustly aligned, containing the blast takes care of itself right until it becomes unnecessary. But with the opposite expectation, containing becomes both necessary (since early AGIs are not yet robustly aligned) and infeasible (since early AGIs are very powerful). So there's a question of which expectation is correct, but the consequences of either position seem to straightforwardly follow.

Comment by Vladimir_Nesov on And All the Shoggoths Merely Players · 2024-02-11T13:02:45.085Z · LW · GW

Stronger versions of seemingly-aligned AIs are probably effectively misaligned in the sense that optimization targets they formulate on long reflection (or superintelligent reflection) might be sufficiently different from what humanity should formulate. These targets don't concretely exist before they are formulated, which is very hard to do (and so won't yet be done by the time there are first AGIs), and strongly optimizing for anything that does initially exist is optimizing for a faulty proxy.

The arguments about dangers of this kind of misalignment seem to apply to humanity itself, to the extent that it can't be expected to formulate and pursue the optimization targets that it should, given the absence of their concrete existence at present. So misalignment in AI risk involves two different issues, difficulty of formulating optimization targets (an issue both for humans and for AIs) and difficulty of replicating in AIs the initial conditions for humanity's long reflection (as opposed to the AIs immediately starting to move in their own alien direction).

To the extent prosaic alignment seems to be succeeding, one of these problems is addressed, but not the other. Setting up a good process that ends up formulating good optimization targets becomes suddenly urgent with AI, which might actually have a positive side effect of reframing the issue in a way that makes complacency of value drift less dominant. Wei Dai and Robin Hanson seem to be gesturing at this point from different directions, how not doing philosophy correctly is liable to get us lost in the long term, and how getting lost in the long term is a basic fact of human condition and AIs don't change that.

Comment by Vladimir_Nesov on Why I no longer identify as transhumanist · 2024-02-07T06:46:20.277Z · LW · GW

Basic science and pure mathematics enable their own subsequent iterations without having them as explicit targets or even without being able to imagine these developments, while doing the work crucial in making them possible.

Extensive preparation never happened with a thing that is ready to be attempted experimentally, because in those cases we just do the experiments, there is no reason not to. With AGI, the reason not to do this is the unbounded blast radius of a failure, an unprecedented problem. Unprecedented things are less plausible, but unfortunately this can't be expected to have happened before, because then you are no longer here to update on the observation.

If the blast radius is not unbounded, if most failures can be contained, then it's more reasonable to attempt to develop AGI in the usual way, without extensive preparation that doesn't involve actually attempting to build it. If preparation in general doesn't help, it doesn't help AGIs either, making them less dangerous and reducing the scope of failure, and so preparation for building them is not as needed. If preparation does help, it also helps AGIs, and so preparation is needed.

Comment by Vladimir_Nesov on Why I no longer identify as transhumanist · 2024-02-07T06:00:34.310Z · LW · GW

Consider an indefinite moratorium on AGI that awaits better tools that make building it a good idea rather than a bad idea. If there was a magic button that rewrote laws of nature to make this happen, would it be a good idea to press it? My point is that we both endorse pressing this button, the only difference is that your model says that building an AGI immediately is a good idea, and so the moratorium should end immediately. My model disagrees. This particular disagreement is not about the generations of people who forgo access to potential technology (where there is no disagreement), and it's not about feasibility of the magic button (which is a separate disagreement). It's about how this technology works, what works in influencing its design and deployment, and the effect it has on the world once deployed.

The crux of that disagreement seems to be about importance of preparation in advance of doing a thing, compared to the process of actually doing the thing in the real world. A pause enables extensive preparation to building an AGI, and high serial speed of thought of AGIs enables AGIs extensive preparation to acting on the world. If such preparation doesn't give decisive advantage, a pause doesn't help, and AGIs don't rewrite reality in a year once deployed. If it does give a decisive advantage, a pause helps significantly, and a fast-thinking AGI shortly gains the affordance of overwriting humanity with whatever it plans to enact.

I see preparation as raising generations of giants to stand on the shoulders of, which in time changes the character of the practical projects that would be attempted, and the details we pay attention to as we carry out such projects. Yes, cryptography isn't sufficient to make systems secure, but absence of cryptography certainly makes them less secure, as is attempting to design cryptographic algorithms without taking the time to get good at it. This is the kind of preparation that makes a difference. Noticing that superintelligence doesn't imply supermorality and that alignment is a concern at all is an important development. Appreciating goodharting and corrigibility changes the safety properties of AIs that appear important, when looking into more practical designs that don't necessarily originate from these considerations. Deceptive alignment is a useful concern to keep in mind, even if in the end it turns out that practical systems don't have that problem. Experiments on GPT-2 sized systems still have a whole lot to teach us about interpretable and steerable architectures.

Without AGI interrupting this process, the kinds of things that people would attempt in order to build an AGI would be very different 20 years from now, and different yet again in 40, 60, 80, and 100 years. I expect some accumulated wisdom to steer such projects in better and better directions, even if the resulting implementation details remain sufficiently messy and make the resulting systems moderately unsafe, with some asymptote of safety where the aging generations make it worthwhile to forgo additional preparation.

Comment by Vladimir_Nesov on Why I no longer identify as transhumanist · 2024-02-07T04:18:26.562Z · LW · GW

Hypotheticals disentangle models from values. A pause is not a policy, not an attempt at a pause that might fail, it's the actual pause, the hypothetical. We can then looks at the various hypotheticals and ask what happens there, which one is better. Hopefully our values can handle the strain of out-of-distribution evaluation and don't collapse into incoherence of goodharting, unable to say anything relevant about situations that our models consider impossible in actual reality.

In the hypothetical of a 100-year pause, the pause actually happens, even if this is impossible in actual reality. One of the things within that hypothetical is death of 4 generations of humans. Another is the AGI that gets built at the end. In your model, that AGI is no safer than the one that we build without the magic hypothetical of the pause. In my model, that AGI is significantly safer. A safer AGI translates into more value of the whole future, which is much longer than the current age of the universe. And an unsafe AGI now is less than helpful to those 4 generations.

AI control is similar in many ways to cybersecurity in that you are trying to limit the AIs access to functions that let it do bad things, and prevent the AI from seeing information that will allow it to fail.

That's the point of AI alignment as distinct from AI control. Your model says the distinction doesn't work. My model says it does. Therefore my model endorses the hypothetical of a pause.

Having endorsed a hypothetical, I can start paying attention to ways of moving reality in its direction. But that is distinct from a judgement about what the hypothetical entails.

Comment by Vladimir_Nesov on Why I no longer identify as transhumanist · 2024-02-05T23:47:51.359Z · LW · GW

"AI pause" talk [...] dooms [...] to more of the same

This depends on the model of risks. If risks without a pause are low, and they don't significantly reduce with a pause, then a pause makes things worse. If risks without a pause are high, but risks after a 20-year pause are much lower, then a pause is an improvement even for personal risk for sufficiently young people.

If risks without pause are high, risks after a 50-year pause remain moderately high, but risks after a 100-year pause become low, then not pausing trades significant measure of the future of humanity for a much smaller measure of survival of currently living humans. Incidentally, sufficiently popular cryonics can put a dent into this tradeoff for humanity, and cryonics as it stands can personally opt out anyone who isn't poor and lives in a country where the service is available.

Comment by Vladimir_Nesov on Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis · 2024-02-04T04:05:44.183Z · LW · GW

The most likely way to get to extremely safe AGI or ASI systems is not by humans creating them, it's by other less-safe AGI systems creating them.

This does seem more likely, but managing to sidestep the less-safe AGI part would be safer. In particular, it might be possible to construct a safe AGI by using safe-if-wielded-responsibly tool AIs (that are not AGIs), if humanity takes enough time to figure out how to actually do that.

Comment by Vladimir_Nesov on Why I no longer identify as transhumanist · 2024-02-03T14:51:46.043Z · LW · GW

the view that there’s probably no persisting identity over time anyway and in some sense I probably die and get reborn all the time in any case

In the long run, this is probably true for humans in a strong sense that doesn't depend on litigation of "personal identity" and "all the time". A related phenomenon is value drift. Neural nets are not a safe medium for keeping a person alive for a very long time without losing themselves, physical immortality is insufficient to solve the problem.

That doesn't mean that the problem isn't worth solving, or that it can't be solved. If AIs don't killeveryone, immortality or uploading is an obvious ask. But avoidance of value drift or of unendorsed long term instability of one's personality is less obvious. It's unclear what the desirable changes should be, but it's clear that there is an important problem here that hasn't been explored.

Comment by Vladimir_Nesov on Managing risks while trying to do good · 2024-02-03T01:09:57.249Z · LW · GW

Metaphorically, there is a question CEV tries to answer, and by "something like CEV" I meant any provisional answer to the appropriate question (so that CEV-as-currently-stated is an example of such an answer). Formulating an actionable answer is not a project humans would be ready to work on directly any time soon. So CEV is something to aim at by intention that defines CEV. If it's not something to aim at, then it's not a properly constructed CEV.

This lack of a concrete formulation is the reason goodharting and corrigibility seem salient in operationalizing the process of formulating it and making use of the formulation-so-far. Any provisional formulation of an alignment target (such as CEV-as-currently-stated) would be a proxy, and so any optimization according to such proxy should be wary of goodharting and be corrigible to further refinement.

The point of discussion of boundaries was in response to possible intuition that expected utility maximization tends to make its demands with great uniformity, with everything optimized in the same direction. Instead, a single goal may ask for different things to happen in different places, or to different people. It's a more reasonable illustration of goal aggregation than utilitarianism that sums over measures of value from different people or things.

Comment by Vladimir_Nesov on Managing risks while trying to do good · 2024-02-02T01:24:38.270Z · LW · GW

The issue with proxies for an objective is that they are similar to it. So an attempt to approximately describe the objective (such as an attempt to say what CEV is) can easily arrive at a proxy that has glaring goodharting issues. Corrigibility is one way of articulating a process that fixes this, optimization shouldn't outpace accuracy of the proxy, which could be improving over time.

Volition of humanity doesn't obviously put the values of the group before values of each individual, as we might put boundaries between individuals and between smaller groups of individuals, with each individual or smaller group having greater influence and applying their values more strongly within their own boundaries. There is then no strong optimization from values of the group, compared to optimization from values of individuals. This is a simplistic sketch of how this could work in a much more elaborate form (where the boundaries of influence are more metaphorical), but it grounds this issue in more familiar ideas like private property, homes, or countries.

Comment by Vladimir_Nesov on Managing risks while trying to do good · 2024-02-02T00:23:31.056Z · LW · GW

This seems mostly goodharting, how the tails come apart when optimizing or selecting for a proxy rather than for what you actually want. And people don't all want the same thing without disagreement or value drift. Near term practical solution is not optimizing too hard and building an archipalago with membranes between people and between communities that bound the scope of stronger optimization. Being corrigible about everything might also be crucial. Longer term idealized solution is something like CEV, saying in a more principled and precise way what the practical solutions only gesture at, and executing on that vision at scale. This needs to be articulated with caution, as it's easy to stray into something that is obviously a proxy and very hazardous to strongly optimize.

Comment by Vladimir_Nesov on AI #49: Bioweapon Testing Begins · 2024-02-01T22:20:50.121Z · LW · GW

Right, a probable way of doing continued pretraining could as well be called "full-tuning", or just "tuning" (which is what you said, not "fine-tuning"), as opposed to "fine-tuning" that trains fewer weights. Though people seem unsure about "fine-tuning" implying that it's not full-tuning, resulting in terms like dense fine-tuning to mean full-tuning.

good terms to distinguish full-tuning the model in line with the original method of pretraining, and full layer LoRA adaptations that 'effectively' continue pretraining but are done in a different manner

You mean like ReLoRA, where full rank pretraining is followed by many batches of LoRA that get baked in? Fine-pretraining :-) Feels like a sparsity-themed training efficiency technique, which doesn't lose it centrality points in being used for "pretraining". To my mind, tuning is cheaper adaptation, things that use OOMs less data than pretraining (even if it's full-tuning). So maybe the terms tuning/pretraining should be defined by the role those parts of the training play in the overall process rather than by the algorithms involved? This makes fine-tuning an unnecessarily specific term, claiming that it's both tuning and trains fewer weights.

Comment by Vladimir_Nesov on Managing risks while trying to do good · 2024-02-01T20:52:19.052Z · LW · GW

Some things are best avoided entirely when you take their risks into account, some become worthwhile only if you manage their risks instead of denying their existence even to yourself. But even when denying risks gives positive outcomes in expectation, adequately managing those risks is even better. Unless society harms the project for acknowledging some risks, which it occasionally does. In which case managing them without acknowledgement (which might require magic cognitive powers) is in tension with acknowledging them despite the expected damage from doing so.

Comment by Vladimir_Nesov on AI #49: Bioweapon Testing Begins · 2024-02-01T19:08:31.932Z · LW · GW

being tuned on a Llama 70B

Based on Mensch's response, Miqu is probably continued pretraining starting at Llama2-70B, a process similar to how CodeLlama or Llemma were trained. (Training on large datasets comparable with the original pretraining dataset is usually not called fine-tuning.)

less capable model trained on the same dataset

If Miqu underwent continued pretraining from Llama2-70B, the dataset won't be quite the same, unless mistral-medium is also pretrained after Llama2-70B (in which case it won't be released under Apache 2).

Comment by Vladimir_Nesov on AI #49: Bioweapon Testing Begins · 2024-02-01T17:44:10.088Z · LW · GW

Bard Gemini Pro (as it's called in lmsys arena) has access to the web and an unusual finetuning with a hyper-analytical character, it often explicitly formulates multiple subtopics in a reply and looks into each of them separately. In contrast the earlier Gemini Pro entries that are not Bard have a finetuning or prompt not suitable for the arena, often giving a single sentence or even a single word as a first response. Thus like Claude 2 (with its unlikable character) they operate at a handicap relative to base model capabilities. GPT-4 on lmsys arena doesn't have access to the web, and GPT-4 Turbo's newer knowledge from 2022-2023 seems more shallow than earlier knowledge, they probably didn't fully retrain the base model just for this release.

So both kinds of Gemini Pro are bad proxies for placement of their base model on the leaderboard. In particular, if the Bard entry in the arena is in fact Gemini Pro and not Gemini Ultra, then Gemini Ultra with Bard Gemini Pro's advantages will probably beat the current GPT-4 Turbo (which doesn't have these advantages) even if Ultra is not smarter that GPT-4.

Comment by Vladimir_Nesov on Processor clock speeds are not how fast AIs think · 2024-01-30T15:09:36.265Z · LW · GW

This notion of thinking speed makes sense for large classes of tasks, not just specific tasks. And a natural class of tasks to focus on is the harder tasks among all the tasks both systems can solve.

So in this sense a calculator is indeed much faster than GPT-4, and GPT-4 is 2 OOMs faster than humans. An autonomous research AGI is capable of autonomous research, so its speed can be compared to humans at that class of tasks.

AI accelerates the pace of history only when it's capable of making the same kind of progress as humans in advancing history, at which point we need to compare their speed to that of humans at that activity (class of tasks). Currently AIs are not capable of that at all. If hypothetically 1e28 training FLOPs LLMs become capable of autonomous research (with scaffolding that doesn't incur too much latency overhead), we can expect that they'll be 1-2 OOMs faster than humans, because we know how they work. Thus it makes sense to claim that 1e28 FLOPs LLMs will accelerate history if they can do research autonomously. If AIs need to rely on extensive search on top of LLMs to get there, or if they can't do it at all, we can instead predict that they don't accelerate history, again based on what we know of how they work.

Comment by Vladimir_Nesov on Processor clock speeds are not how fast AIs think · 2024-01-30T14:15:05.440Z · LW · GW

current AIs are not thinking faster than humans [...] GPT-4 has higher token latency than GPT-3.5, but I think it's fair to say that GPT-4 is the model that "thinks faster"

This notion of thinking speed depends on the difficulty of a task. If one of the systems can't solve a problem at all, it's neither faster nor slower. If both systems can solve a problem, we can compare the time they take. In that sense, current LLMs are 1-2 OOMs faster than humans at the tasks both can solve, and much cheaper.

Old chess AIs were slower than humans good at chess. If future AIs can take advantage of search to improve quality, they might again get slower than humans at sufficiently difficult tasks, while simultaneously being faster than humans at easier tasks.

Comment by Vladimir_Nesov on Processor clock speeds are not how fast AIs think · 2024-01-30T01:05:56.308Z · LW · GW

Projects that involve interplanetary transit are not part of the development I discuss, so they can't slow it down. You don't need to wait for paint to dry if you don't use paint.

There are no additional pieces of infrastructure that need to be in place to make programmable cells, only their design and what modern biotech already has to manufacture some initial cells. It's a question of sample efficiency in developing simulation tools, how many observations does it take for simulation tools to get good enough, if you had centuries to design the process of deciding what to observe and how to make use of the observations to improve the simulation tools.

So a crux might be impossibility of creating the simulation tools with data that can be collected in the modern world over a few months. It's an issue distinct from inability to develop programmable cells.

Comment by Vladimir_Nesov on Processor clock speeds are not how fast AIs think · 2024-01-29T23:05:23.390Z · LW · GW

Machining equipment takes time to cut an engine, nano lathe a part, or if we are growing human organs to treat VIPs it takes months for them to grow.

That's why you don't do any of the slower things at all (in a blocking way), and instead focus on the critical path of controllable cells for macroscopic biotech or something like that, together with the experiments needed to train simulators good enough to design them. This enables exponentially scaling physical infrastructure once completed, which can be used to do all the other things. Simulation is not the methods of today, it's all the computational shortcuts to making the correct predictions about the simulated systems that the AGIs can come up with in subjective centuries of thinking, with a few experimental observations to ground the thinking. And once the initial hardware scaling project is completed, it enables much better simulation of more complicated things.

Comment by Vladimir_Nesov on Why I take short timelines seriously · 2024-01-29T22:41:53.495Z · LW · GW

there's significant weight on logarithmically diminishing returns such that the things that are strong than us never get so much stronger that we have no hope of understanding what they're doing

If autonomous research level AGIs are still 2 OOMs faster than humans, that leads to massive scaling of hardware within years even if they are not smarter, at which point it's minds the size of cities. So the probable path to weak takeoff is a slow AGI that doesn't get faster on hardware of the near future, and being slow it won't soon help scale hardware.

Comment by Vladimir_Nesov on Processor clock speeds are not how fast AIs think · 2024-01-29T22:16:37.382Z · LW · GW

When you design a thing, you can intentionally make it more predictable and faster to test, in particular with modularity. If the goal is designing cells that grow and change in controllable ways, all experiments are tiny. Like with machine learning, new observations from the experiments generalize by improving the simulation tools, not just object level designs. And much more advanced theory of learning should enable much better sample efficiency with respect to external data.

If millionfold speedup is currently already feasible, it doesn't take hardware advancement and as a milestone indicates no hardware benefit for simulation. That point responded to the hypothetical where there is already massive scaling in hardware compared to today (such as through macroscopic biotech to scale physical infrastructure), which should as another consequence make simulation of physical designs much better (on its own hardware specialized for being good at simulation). For example, this is where I expect uploading to become feasible to develop, not at the 300x speedup stage of software-only improvement, because simulating wild systems is harder than designing something predictable.

(This is exploratory engineering not forecasting, I don't actually expect human level AGI without superintelligence to persist that long, and if nanotech is possible I don't expect scaling of macroscopic biotech. But neither seems crucial.)

Comment by Vladimir_Nesov on Processor clock speeds are not how fast AIs think · 2024-01-29T21:04:06.603Z · LW · GW

Any testing can be done in simulation, as long as you have a simulator and it's good enough. A few hundreds times speedup in thinking allows very quickly writing very good specialized software for learning and simulation of all relevant things, based on theory that's substantially better. The speed of simulation might be a problem, and there's probably a need for physical experiments to train the simulation models (but not to directly debug object level engineering artifacts).

Still, in the physical world activity of an unfettered 300x speed human level AGI probably looks like building tools for building tools without scaling production and on first try, rather than cycles of experiments and reevaluation and productization. I suspect macroscopic biotech might be a good target. It's something obviously possible (as in animals) and probably amenable to specialized simulation. This might take some experiments to pin down, but probably not years of experiments, as at every step it takes no time at all to very judiciously choose what data to collect next. There is already a bootstrapping technology, fruit fly biomass doubles every 2 days, energy from fusion will help with scaling, and once manufactured, cells can reconfigure.

A millionfold speedup in thinking (still assuming no superintelligence) probably requires hardware that implies ability to significantly speed up simulations.

Comment by Vladimir_Nesov on Processor clock speeds are not how fast AIs think · 2024-01-29T16:24:49.202Z · LW · GW

Throughput doesn't straightforwardly accelerate history, serial speedup does. At a serial speedup of 10x-100x, decades pass in a year. If an autonomous researcher AGI develops better speculative decoding and other improvements during this time, the speedup quickly increases once the process starts, though it might still remain modest without changing hardware or positing superintelligence, only centuries a year or something.

For neurons-to-transistors comparison, probably both hardware and algorithms would need to change to make this useful, but then the critical path length of transformers is quite low. Two matrices of size NxN only need on the order of log(N) sequential operations to multiply them. It's not visibly relevant with modern hardware, but laws of physics seem to allow hardware that makes neural nets massively faster, probably a millionfold speedup of human level thought is feasible. This is a long term project for those future centuries that happen in a year.

Comment by Vladimir_Nesov on Against Nonlinear (Thing Of Things) · 2024-01-21T20:28:44.380Z · LW · GW

The important part isn't assertions (which honestly I don't see here), it's asking the question. Like with advice, it's useless when taken as a command without argument, but as framing it's asking whether you should be doing a thing more or less than you normally do it, and that can be valuable by drawing attention to that question, even when the original advice is the opposite of what makes sense.

With discussion of potential issues of any kind, having norms that call for avoiding such discussion or for burdening it with rigor requirements makes it go away, and so the useful question of what the correct takes are remains unexplored.

Comment by Vladimir_Nesov on Managing catastrophic misuse without robust AIs · 2024-01-21T12:21:08.540Z · LW · GW

It spits out much scarier information than a google search supplies. Much.

I see a sense in which GPT-4 is completely useless for serious programming in the hands of a non-programmer who wouldn't be capable/inclined to become a programmer without LLMs, even as it's somewhat useful for programming (especially with unfamiliar but popular libraries/tools). So the way in which a chatbot helps needs qualification.

One possible measure is how much a chatbot increases the fraction of some demographic that's capable of some achievement within some amount of time. All these "changes the difficulty by 4x" or "by 1.25x" need to mean something specific, otherwise there is hopeless motte-and-bailey that allows credible reframing of any data as fearmongering. That is, even when it's only intuitive guesses, the intuitive guesses should be about a particular meaningful thing rather than level of scariness. Something prediction-marketable.

Comment by Vladimir_Nesov on Against Nonlinear (Thing Of Things) · 2024-01-21T11:48:04.269Z · LW · GW

I'd like to at least see some numbers before you declare something immoral and dangerous to discuss!

Discussing hypothetical dangers shouldn't require numbers. It's probably not so dangerous to discuss hypothetical dangers that they shouldn't be discussed when there are no numbers.

Comment by Vladimir_Nesov on TurnTrout's shortform feed · 2024-01-20T11:36:16.566Z · LW · GW

A bad map that expresses the territory with great uncertainty can be confidently called a bad map, calling it a good map is clearly wrong. In that sense the shoggoth imagery reflects the quality of the map, and as it's clearly a bad map, better imagery would be misleading about the map's quality. Even if the underlying territory is lovely, this isn't known, unlike the disastorous quality of the map of the territory, whose lack of quality is known with much more confidence and in much greater detail. Here be dragons.

(This is one aspect of the meme where it seems appropriate. Some artist's renditions, including the one you used, channel LeCake, which your alternative image example loses, but obviously the cake is nicer than the shoggoth.)

Comment by Vladimir_Nesov on Estimating efficiency improvements in LLM pre-training · 2024-01-20T10:54:47.537Z · LW · GW

My point is that algorithmic improvements (in the way I defined them) are very limited, even in the long term, and that there hasn't been a lot of algorithmic improvement in this sense in the past as well. The issue is that the details of this definition matter, if you start relaxing them and start interpreting "algorithmic improvement" more informally, you become able to see more algorithmic improvement in the past (and potentially in the future).

One take is how in the past, data was often limited and carefully prepared, models didn't scale beyond all reasonable compute, and there wasn't patience to keep applying compute once improvement seemed to stop during training. So the historical progress is instead the progress in willingness to run larger experiments and in ability to run larger experiments, because you managed to prepare more data or to make your learning setup continue working beyond the scale that seemed reasonable before.

This isn't any algorithmic progress at all in the sense I discuss, and so what we instead observe about algorithmic progress in the sense I discuss is its historical near-absence, suggesting that it won't be appearing in the future either. You might want to take a look at Mosaic and their composer, they've done the bulk of work on collecting the real low-hanging algorithmic improvement fruit (just be careful about the marketing that looks at the initial fruit-picking spree and presents it as endless bounty).

Data doesn't have this problem, that's how we have 50M parameter chess or Go models that match good human performance even though they are tiny by LLM standards. Different data sources confer different levels of competence on the models, even as you still need approximately the same scale to capture a given source in a model (due to lack of significant algorithmic improvement). But nobody knows how to scalably generate better data for LLMs. There's Microsoft's phi that makes some steps in that direction by generating synthetic exercises, which are currently being generated by stronger models trained on much more data than the amount of exercises being generated. Possibly this kind of thing might eventually take off and become self-sufficient, so that a model manages to produce a stronger model. Or alternatively, there might be some way to make use of the disastrous CommonCrawl to generate a comparable amount of data of the usual good level of quality found in better sources of natural data, without exceeding the level of quality found in good natural data. And then there's RL, which can be thought of as a way of generating data (with model-free RL through hands-on exploration of the environment that should be synthetic to get enough data, and with model-based RL through dreaming about the environment, which might even be the real world in practice).

But in any case, the theory of improvement here is a better synthetic data generation recipe that makes a better source, not a better training recipe for how to capture a source into a model.

Comment by Vladimir_Nesov on Estimating efficiency improvements in LLM pre-training · 2024-01-19T21:45:27.983Z · LW · GW

Before Chinchilla scaling, nobody was solving the relevant optimization problem. Namely, given a perplexity target, adjust all parameters including model size and geometry, sparsity, and amount of data (sampled from a fixed exceedingly large dataset) to hit the perplexity target with as few FLOPs as possible. Do this for multiple perplexities, make a perplexity-FLOP plot of optimized training runs to be able to interpolate. Given a different achitecture with its own different plot, estimated improvement in these FLOPs for each fixed perplexity within some range is then the training efficiency improvement valid within that range of perplexities. This might not be the most important measurement in practice, but it makes the comparison between very different architectures meaningful at least in some sense.

Without MoE, the gains since GPT-3 recipe seem to be about 6x (see Figure 4 in the Mamba paper). I'm not sure what the MoE gains are on top of that, the scaling laws I've seen don't quite measure the thing I've defined above (I'd be grateful for a pointer). Going from FP32 to BF16 to something effectively 6-bit with Microscaling (see section 4.5) without loss of training quality is another potential win (if it's implemented in future hardware, or if there is a software-only way of getting similar results without an overhead).

But I'm not sure there is much more there, once the correct optimization problem is being solved and the low hanging fruit is collected, and if the dataset remains natural. The historical algorithmic progress is misleading, because it wasn't measuring what happens when you use unlimited data and can vary model size to get as much compression quality as possible out of given compute.

Comment by Vladimir_Nesov on AlphaGeometry: An Olympiad-level AI system for geometry · 2024-01-18T15:55:37.366Z · LW · GW

This is another example of how matching specialized human reasoning skill seems routinely feasible with search guided by 100M scale networks trained for a task a human would spend years mastering. These tasks seem specialized, but it's plausible all breadth of human activity can be covered with a reasonable number of such areas of specialization. What's currently missing is automation of formulation and training of systems specialized in any given skill.

The often touted surprisingly good human sample efficiency might just mean that when training is set up correctly, it's sufficient to train models of size comparable to the amount of external text data that a human might need to master a skill, rather than models of the size comparable to a human brain. This doesn't currently work for training systems that autonomously do research and produce new AI experiments and papers, and in practice the technology might take a very different route. But once it does work, surpassing human performance might fail to require millions of GPUs even for training.

Comment by Vladimir_Nesov on AI doing philosophy = AI generating hands? · 2024-01-15T13:22:09.944Z · LW · GW

Philosophy and to some extent even decision theory are more like aspects of value content. AGIs and ASIs have the capability to explore them, if only they had the motive. Not taking away this option and not disempowering its influence doesn't seem very value-laden, so it's not pivotal to explore it in advance, even though it would help. Avoiding disempowerment is sufficient to eventually get around to industrial production of high quality philosophy. This is similar to how the first generations of powerful AIs shouldn't pursue CEV, and more to the point don't need to pursue CEV.

Comment by Vladimir_Nesov on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-15T11:31:13.945Z · LW · GW

I think human level AGIs being pivotal in shaping ASIs is very likely if AGIs get developed in the next few years as largely the outcome of scaling, and still moderately likely overall. If that is the case, what matters is alignment of human level AGIs and the social dynamics of their deployment and their own activity. So control despite only being aligned as well as humans are (or somewhat better) might be sufficient, as one of the things AGIs might work on is improving alignment.

The point about deceptive alignment being a special case of trustworthiness goes both ways, a deceptively aligned AI really can be a good ally, as long as the situation is maintained that prevents AIs from individually getting absolute power, and as long as the AIs don't change too much from that baseline. Which are very difficult conditions to maintain while the world is turning upside down.

Comment by Vladimir_Nesov on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-15T10:47:42.850Z · LW · GW

It seems very weird to ascribe a generic "bad takes overall" summary to that group, given that you yourself are directly part of it.

This sentence channels influence of an evaporative cooling norm (upon observing bad takes, either leave the group or conspicuously ignore the bad takes), also places weight on acting on the basis of one's identity. (I'm guessing this is not in tune with your overall stance, but it's evidence of presence of a generator for the idea.)