Posts
Comments
As you sort of refer to, it's also the case that the 7.5 hour run time can be paid once, and then remain true of the system. It's a one-time cost!
So even if we have 100 different things we need to prove for a higher level system, then even if it takes a year of engineering and mathematics research time plus a day or a month of compute time to get a proof, we can do them in parallel, and this isn't much of a bottleneck, if this approach is pursued seriously. (Parallelization is straightforward if we can, for example, take the guarantee provided by one proof as an assumption in others, instead of trying to build a single massive proof.) And each such system built allows for provability guarantees for systems build with that component, if we can build composable proof systems, or can separate the necessary proofs cleanly.
Yes - I didn't say it was hard without AI, I said it was hard. Using the best tech in the world, humanity doesn't *even ideally* have ways to get AI to design safe useful vaccines in less than months, since we need to do actual trials.
I know someone who has done lots of reporting on lab leaks, if that helps?
Also, there are some "standard" EA-adjacent journalists who you could contact / someone could introduce you to, if it's relevant to that as well.
Vaccine design is hard, and requires lots of work. Seems strange to assert that someone could just do it on the basis of a theoretical design. Viral design, though, is even harder, and to be clear, we've never seen anyone build one from first principles; the most we've seen is modification of extant viruses in minor ways where extant vaccines for the original virus are likely to work at least reasonably well.
I have a lot more to say about this, and think it's worth responding to in much greater detail, but I think that overall, the post criticizes Omhundro and Tegmark's more extreme claims somewhat reasonably, though very uncharitably, and then assumes that other proposals which seem to be related, especially Dalyrymple et al. approach, are essentially the same, and doesn't engage with the specific proposal at all.
To be very specific about how I think the post in unreasonable, there are a number of places where a seeming steel-man version of the proposals are presented, and then this steel-manned version, rather than the initial proposal for formal verification, is attacked. But this amounts to a straw-man criticism of the actual proposals being discussed!
For example, this post suggests that arbitrary DNA could be proved safe by essentially impossible modeling ("on-demand physical simulations of entire human bodies (with their estimated 36 trillion cells [9]), along with the interactions between the cells themselves and the external world and then run those simulations for years"). This is true, that would work - but the proposal ostensibly being criticized was to check narrower questions about whether DNA synthesis is being used to produce something harmful. And Dalyrmple et al explained explicitly what they might have included elsewhere in the paper ("Examples include machines provably impossible to login to without correct credentials, DNA synthesizers that provably cannot synthesize certain pathogens, and AI hardware that is provably geofenced, time-limited (“mortal”) or equipped with a remote-operated throttle or kill-switch. Provably compliant sensors can be specified to ensure “zeroization”, in which tampering with PCH is guaranteed to cause detection and erasure of private keys.")
But that premise falls apart as soon as a large fraction of those (currently) highly motivated (relatively) smart tech workers can only get jobs in retail or middle management.
Yeah, I think the simplest thing for image generation is for model hosting providers to use a separate tool - and lots of work on that already exists. (see, e.g., this, or this, or this, for different flavors.) And this is explicitly allowed by the bill.
For text, it's harder to do well, and you only get weak probabilistic identification, but it's also easy to implement an Aaronson-like scheme, even if doing it really well is harder. (I say easy because I'm pretty sure I could do it myself, given, say, a month working with one of the LLM providers, and I'm wildly underqualified to do software dev like this.)
- Internet hosting platforms are responsible for ensuring indelible watermarks.
The bill requires that "Generative AI hosting platforms shall not make available a generative AI system that does not allow a GenAI provider, to the greatest extent possible and either directly providing functionality or making available the technology of a third-party vendor, to apply provenance data to content created or substantially modified by the system"
This means that sites running GenAI models need to allow the GenAI systems to implement their required watermarking, not that hosting providers (imgur, reddit, etc.) need to do so. Less obviously good, but still importantly, it also doesn't require the GenAI hosting provider the ensure the watermark is indelible, just that they include watermarking via either the model, or a third-party tool, when possible.
LLMs are already moderately-superhuman at the task of predicting next tokens. This isn't sufficient to help solve alignment problems. We would need them to meet the much much higher bar of being moderately-superhuman at the general task of science/engineering.
We also need the assumption - which is definitely not obvious - that significant intelligence increases are relatively close to achievable. Superhumanly strong math skills presumably don't let AI solve NP problems in P time, and it's similarly plausible - though far from certain - that really good engineering skill tops out somewhere only moderately above human ability due to instrinsic difficulty, and really good deception skills top out somewhere not enough to subvert the best systems that we could build to do oversight and detect misalignment. (On the other hand, even with these objections being correct, it would only show that control is possible, not that it is likely to occur.)
Live like life might be short. More travel even when it means missing school, more hugs, more things that are fun for them.
Optimizing for both impact and personal fun, I think this is probably directionally good advice for the types of analytic people who think about the long term a lot, regardless of kids. (It's bad advice for people who aren't thinking very long term already, but that's not who is reading this.)
First, I think this class of work is critical for deconfusion, which is critical if we need a theory for far more powerful AI systems, rather than for very smart but still fundamentally human level systems.
Secondly, concretely, it seems that very few other approaches to safety have the potential to provide enough fundamental understanding to allow us to make strong statements about models before they are fully trained. This seems like a critical issue if we are concerned about very strong models that could pose risks during testing, or possibly even during training. And as far as I'm aware, nothing in the interpretability and auditing spaces has a real claim to be able to make clear statements about those risks, other than perhaps to suggest interim testing during model training - which could work, if a huge amount of such work is done, but seems very unlikely to happen.
Edit to add: Given the votes on this, what specifically do people disagree with?
Ideally, statements should be at least two of true, necessary/useful, and kind. I agree that this didn't breach confidentiality of some sort, and yet I think that people should generally follow a policy where we don't publicize random non-notable people's names without explicit permission when discussing them in a negative light - and the story might attempt to be sympathetic, but it certainly isn't complementary.
Unlikely, since he could have walked away with a million dollars instead of doing this. (Per Zvi's other post, "Leopold was fired right before his cliff, with equity of close to a million dollars. He was offered the equity if he signed the exit documents, but he refused.")
The most obvious reason for skepticism of the impact that would cause follows.
David Manheim: I do think that Leopold is underrating how slow much of the economy will be to adopt this. (And so I expect there to be huge waves of bankruptcies of firms that are displaced / adapted slowly, and resulting concentration of power- but also some delay as assets change hands.)
I do not think Leopold is making that mistake. I think Leopold is saying a combination of the remote worker being a seamless integration, and also not much caring about how fast most businesses adapt to it. As long as the AI labs (and those in their supply chains?) are using the drop-in workers, who else does so mostly does not matter. The local grocery store refusing to cut its operational costs won’t much postpone the singularity.
I want to clarify the point I was making - I don't think that this directly changes the trajectory of AI capabilities, I think it changes the speed at which the world wakes up to those possibilities. That is, I think that in worlds with the pace of advances he posits, the impacts on the economy are slower than in AI, and we get faster capabilities takeoff than we do in economic impacts that make the transformation fully obvious to the rest of the world.
The more important point, in my mind, is what this means for geopolitics, which I think aligns with your skepticism. As I said responding to Leopold's original tweet: "I think that as the world wakes up to the reality, the dynamics change. The part of the extensive essay I think is least well supported, and least likely to play out as envisioned, is the geopolitical analysis. (Minimally, there's at least as much uncertainty as AI timelines!)"
I think the essay showed lots of caveats and hedging about the question of capabilities and timelines, but then told a single story about geopolitics - one that I think it both unlikely, and that fails to notice the critical fact - that this is describing a world where government is smart enough to act quickly, but not smart enough to notice that we all die very soon. To quote myself again, "I think [this describes] a weird world where military / government "gets it" that AGI will be a strategic decisive advantage quickly enough to nationalize labs, but never gets the message that this means it's inevitable that there will be loss of control at best."
I don't really have time, but I'm happy to point you to a resource to explain this: https://oyc.yale.edu/economics/econ-159
And I think O disagreed with the concepts inasmuch as you are saying something substantive, but the terms were confused, and I suspect, but may be wrong, that if laid out clearly, there wouldn't be any substantive conclusion you could draw from the types of examples you're thinking of.
That seems a lot like Davidad's alignment research agenda.
Agree that it's possible to have small amounts of code describing very complex things, and I said originally, it's certainly partly spaghetti towers. However, to expand on my example, for something like a down-and-in European call option, I can give you a two line equation for the payout, or a couple lines of easily understood python code with three arguments (strike price, min price, final price) to define the payout, but it takes dozens of pages of legalese instead.
My point was that the legal system contains lots of that type of what I'd call fake complexity, in addition to the real complexity from references and complex requirements.
Very happy to see a concrete outcome from these suggestions!
I'll note that I think this is a mistake that lots of people working in AI safety have made, ignoring the benefits of academic credentials and prestige because of the obvious costs and annoyance. It's not always better to work in academia, but it's also worth really appreciating the costs of not doing so in foregone opportunities and experience, as Vanessa highlighted. (Founder effects matter; Eliezer had good reasons not to pursue this path, but I think others followed that path instead of evaluating the question clearly for their own work.)
And in my experience, much of the good work coming out of AI Safety has been sidelined because it fails the academic prestige test, and so it fails to engage with academics who could contribute or who have done closely related work. Other work avoids or fails the publication process because the authors don't have the right kind of guidance and experience to get their papers in to the right conferences and journals, and not only is it therefore often worse for not getting feedback from peer review, but it doesn't engage others in the research area.
There aren't good ways to do this automatically for text, and state of the art is rapidly evolving.
https://arxiv.org/abs/2403.05750v1
For photographic images which contain detailed images humans or contain non-standard objects with details, there are still some reasonably good heuristics for when AIs will mess up those details, but I'm not sure how long they will be valid for.
This is one of the key reasons that the term alignment was invented and used instead of control; I can be aligned with the interests of my infant, or my pet, without any control on their part.
Most of this seems to be subsumed in the general question of how do you do research, and there's lot of advice, but it's (ironically) not at all a science. From my limited understanding of what goes on in the research groups inside these companies, it's a combination of research intuition, small scale testing, checking with others and discussing the new approach, validating your ideas, and getting buy-in from people higher up that it's worth your and their time to try the new idea. Which is the same as research generally.
At that point, I'll speculate and assume whatever idea they have is validated in smaller but still relatively large settings. For things like sample efficiency, they might, say, train a GPT-3 size model, which now cost only a fraction of the researcher's salary to do. (Yes, I'm sure they all have very large compute budgets for their research.) If the results are still impressive, I'm sure there is lots more discussion and testing before actually using the method in training the next round of frontier models that cost huge amounts of money - and those decisions are ultimately made by the teams building those models, and management.
It seems like you're not being clear about how you are thinking about the cases, or are misusing some of the terms. Nash Equilibria exist in zero-sum games, so those aren't different things. If you're familiar with how to do game theory, I think you should carefully set up what you claim the situation is in a payoff matrix, and then check whether, given the set of actions you posit people have in each case, the scenario is actually a Nash equilibrium in the cases you're calling Nash equilibrium.
...but there are a number of EAs working on cybersecurity in the context of AI risks, so one premise of the argument here is off.
And a rapid response site for the public to report cybersecurity issues and account hacking generally would do nothing to address the problems that face the groups that most need to secure their systems, and wouldn't even solve the narrower problem of reducing those hacks, so this seems like the wrong approach even given the assumptions you suggest.
I agree that your question is weird and confused, and agree that if that were the context, my post would be hard to understand. But I think it's a bad analogy! That's because there are people who have made analogies between AI and Bio very poorly, and it's misleading and leading to sloppy thinking. In my experience seeing discussions on the topic, either the comparisons are drawn carefully and the relevant dissimilarities are discussed clearly, or they are bad analogies.
To stretch your analogy, if the context were that I'd recently heard people say "Steve and David are both people I know, and if you don't like Steve, you probably won't like David," and also "Steve and David are both concerned about AI risks, so they agree on how to discuss the issue," I'd wonder if there was some confusion, and I'd feel comfortable saying that in general, Steve is an unhelpful analog for David, and all these people should stop and be much more careful in how they think about comparisons between us.
I agree with you that analogies are needed, but they are also inevitably limited. So I'm fine with saying "AI is concerning because its progress is exponential, and we have seen from COVID-19 that we need to intervene early," or "AI is concerning because it can proliferate as a technology like nuclear weapons," or "AI is like biological weapons in that countries will pursue and use these because they seem powerful, without appreciating the dangers they create if they escape control." But what I am concerned that you are suggesting is that we should make the general claim "AI poses uncontrollable risks like pathogens do," or "AI needs to be regulated the way biological pathogens are," and that's something I strongly oppose. By ignoring all of the specifics, the analogy fails.
In other words, "while I think the disanalogies are compelling, comparison can still be useful as an analytic tool - while keeping in mind that the ability to directly learn lessons from biorisk to apply to AI is limited by the vast array of other disanalogies."
I said:
disanalogies listed here aren’t in and of themselves reasons that similar strategies cannot sometimes be useful, once the limitations are understood. For that reason, disanalogies should be a reminder and a caution against analogizing, not a reason on its own to reject parallel approaches in the different domains.
You seem to be simultaneously claiming that I had plenty of room to make a more nuanced argument, and then saying you think I'm saying something which exactly the nuance I included seems to address. Yes, people could cite the title of the blog post to make a misleading claim, assuming others won't read it - and if that's your concern, perhaps it would be enough to change the title to "Biorisk is Often an Unhelpful Analogy for AI Risk," or "Biorisk is Misleading as a General Analogy for AI Risk"?
I agree that we do not have an exact model for anything in immunology, unlike physics, and there is a huge amount of uncertainty. But that's different than saying it's not well-understood; we have clear gold-standard methods for determining answers, even if they are very expensive. This stands in stark contrast to AI, where we don't have the ability verify that something works or is safe at all without deploying it, and even that isn't much of a check on its later potential for misuse.
But aside from that, I think your position is agreeing with mine much more than you imply. My understanding is that we have newer predictive models which can give uncertain but fairly accurate answers to many narrow questions. (Older, non-ML methods also exist, but I'm less familiar with them.) In your hypothetical case, I expect that the right experts can absolutely give indicative answers about whether a novel vaccine peptide is likely or unlikely to have cross-reactivity with various immune targets, and the biggest problem is that it's socially unacceptable to assert confidence in anything short of tested and verified case. But the models can get, in the case of the Zhang et al paper above, 70% accurate answers, which can help narrow the problem for drug or vaccine discovery, then they do need to be followed with in vitro tests and trials.
I'm arguing exactly the opposite; experts want to make comparisons carefully, and those trying to transmit the case to the general public should, at this point, stop using these rhetorical shortcuts that imply wrong and misleading things.
On net, the analogies being used to try to explain are bad and misleading.
I agree that I could have tried to convey a different message, but I don't think it's the right one. Anyone who wants to dig in can decide for themselves, but you're arguing that ideal reasoners won't conflate different things and can disentangle the similarities and differences, and I agree, but I'm noting that people aren't doing that, and others seem to agree.
I don't understand why you disagree. Sure, pathogens can have many hosts, but hosts generally follow the same logic as for humans in terms of their attack surface being static and well adapted, and are similarly increasingly understood.
That doesn't seem like "consistently and catastrophically," it seems like "far too often, but with thankfully fairly limited local consequences."
BSL isn't the thing that defines "appropriate units of risk", that's pathogen risk-group levels, and I agree that those are are problem because they focus on pathogen lists rather than actual risks. I actually think BSL are good at what they do, and the problem is regulation and oversight, which is patchy, as well as transparency, of which there is far too little. But those are issues with oversight, not with the types of biosecurity measure that are available.
If you're appealing to OpenPhil, it might be useful to ask one of the people who was working with them on this as well.
And you've now equivocated between "they've induced an EA cause area" and a list of the range of risks covered by biosecurity - not what their primary concerns are - and citing this as "one of them." I certainly agree that biosecurity levels are one of the things biosecurity is about, and that "the possibility of accidental deployment of biological agents" is a key issue, but that's incredibly far removed from the original claim that the failure of BSL levels induced the cause area!
I mean, I'm sure something more restrictive is possible.
But what? Should we insist that the entire time someone's inside a BSL-4 lab, we have a second person who is an expert in biosafety visually monitoring them to ensure they don't make mistakes? Or should their air supply not use filters and completely safe PAPRs, and feed them outside air though a tube that restricts their ability to move around instead? (Edit to Add: These are already both requires in BSL-4 labs. When I said I don't know of anything more restrictive they could do, I was being essentially literal - they do everything including quite a number of unreasonable things to prevent human infection, short of just not doing the research.)
Or do you have some new idea that isn't just a ban with more words?
"lists of restrictions" are a poor way of managing risk when the attack surface is enormous
Sure, list-based approaches are insufficient, but they have relatively little to do with biosafety levels of labs, they have to do with risk groups, which are distinct, but often conflated. (So Ebola or Smallpox isn't a "BSL-4" pathogen, because there is no such thing. )
I just meant "gain of function" in the standard, common-use sense—e.g., that used in the 2014 ban on federal funding for such research.
That ban didn't go far enough, since it only applied to 3 pathogen types, and wouldn't have banned what Wuhan was doing with novel viruses, since that wasn't working with SARS or MERS, it was working with other species of virus. So sure, we could enforce a broader version of that ban, but getting a good definition that's both extensive enough to prevent dangerous work and that doesn't ban obviously useful research is very hard.
Having written extensively about it, I promise you I'm aware. But please, tell me more about how this supports the original claim which I have been disagreeing with, that these class of incidents were or are the primary concern of the EA biosecurity community, the one that led to it being a cause area.
The OP claimed a failure of BSL levels was the single thing that induced biorisk as a cause area, and I said that was a confused claim. Feel free to find someone who disagrees with me here, but the proximate causes of EAs worrying about biorisk has nothing to do with BSL lab designations. It's not BSL levels that failed in allowing things like the soviet bioweapons program, or led to the underfunded and largely unenforceable BWC, or the way that newer technologies are reducing the barriers to terrorists and other being able to pursue bioweapons.
I did not say that they didn't want to ban things, I explicitly said "whether to allow certain classes of research at all," and when I said "happy to rely on those levels, I meant that the idea that we should have "BSL-5" is the kind of silly thing that novice EAs propose that doesn't make sense because there literally isn't something significantly more restrictive other than just banning it.
I also think that "nearly all EA's focused on biorisk think gain of function research should be banned" is obviously underspecified, and wrong because of the details. Yes, we all think that there is a class of work that should be banned, but tons of work that would be called gain of function isn't in that class.
BSL levels, which have failed so consistently and catastrophically they've induced an EA cause area,
This is confused and wrong, in my view. The EA cause area around biorisk is mostly happy to rely on those levels, and unlike for AI, the (very useful) levels predate EA interest and give us something to build on. The questions are largely instead about whether to allow certain classes of research at all, the risks of those who intentionally do things that are forbiddn, and how new technology changes the risk.
and then the 2nd AI pays some trivial amount to the 1st for the inconvenience
Completely as an aside, coordination problems among ASI don't go away, so this is a highly non trivial claim.
I thought that the point was that either managed-interface-only access, or API access with rate limits, monitoring, and an appropriate terms of service, can prevent use of some forms of scaffolding. If it's staged release, this makes sense to do, at least for a brief period while confirming that there are not security or safety issues.
These days it's rare for a release to advance the frontier substantially.
This seems to be one crux. Sure, there's no need for staged release if the model doesn't actually do much more than previous models, and doesn't have unpatched vulnerabilities of types that would be identified by somewhat broader testing.
The other crux, I think, is around public release of model weights. (Often referred to, incorrectly, as "open sourcing.") Staged release implies not releasing weights immediately - and I think this is one of the critical issues with what companies like X have done that make it important to demand staged release for any models claiming to be as powerful or more powerful than current frontier models. (In addition to testing and red-teaming, which they also don't do.)
It is funny, but it also showed up on April 2nd in Europe and anywhere farther east...
I think there are two very different cases of "almost works" that are being referred to. The first is where the added effort is going in the right direction, and the second is where it is slightly wrong. For the first case, if you have a drug that doesn't quite treat your symptoms, it might be because it addresses all of them somewhat, in which case increasing the dose might make sense. For the second case, you could have one that addresses most of the symptoms very well, but makes one worse, or has an unacceptable side effect, in which case increasing the dose wouldn't help. Similarly, we could imagine a muscle that is uncomfortable. The second case might then be a stretch that targets almost the right muscle. That isn't going to help if you do it more. The first case, on the other hand, would be a stretch that targets the right muscle but isn't doing enough, and obviously it could be great to do more often, or for a longer time.
Again, I think it was a fine and enjoyable post.
But I didn't see where you "demonstrate how I used very basic rationalist tools to uncover lies," which could have improved the post, and I don't think this really explored any underappreciated parts of "deception and how it can manifest in the real world" - which I agree is underappreciated. Unfortunately, this post didn't provide much clarity about how to find it, or how to think about it. So again, it's a fine post, good stories, and I agree they illustrate being more confused by fiction than reality, and other rationalist virtues, but as I said, it was not "the type of post that leads people to a more nuanced or better view of any of the things discussed."
I disagree with this decision, not because I think it was a bad post, but because it doesn't seem like the type of post that leads people to a more nuanced or better view of any of the things discussed, much less a post that provided insight or better understanding of critical things in the broader world. It was enjoyable, but not what I'd like to see more of on Less Wrong.
(Note: I posted this response primarily because I saw that lots of others also disagreed with this, and think it's worth having on the record why at least one of us did so.)
"Climate change is seen as a bit less of a significant problem"
That seems shockingly unlikely (5%) - even if we have essentially eliminated all net emissions (10%), we will still be seeing continued warming (99%) unless we have widely embraced geoengineering (10%). If we have, it is a source of significant geopolitical contention (75%) due to uneven impacts (50%) and pressure from environmental groups (90%) worried that it is promoting continued emissions and / or causes other harms. Progress on carbon capture is starting to pay off (70%) but is not (90%) deployed at anything like the scale needed to stop or reverse warming.
Adaptation to climate change has continued (99%), but it is increasingly obvious how expensive it is and how badly it is impacting developing world. The public still seems to think this is the fault of current emissions (70%) and carbon taxes or similar legal limits are in place for a majority of G7 countries (50%) but less than half of other countries (70%).
To start, the claim that it was found 2 miles from the facility is an important mistake, because WIV is 8 miles from the market. For comparison to another city people might know better, in New York, that's the distance between World Trade Center and either Columbia University, or Newark Airport. Wuhan's downtown is around 16 miles across. 8 miles away just means it was in the same city.
And you're over-reliant on the evidence you want to pay attention to. For example, even rstricting ourselves to "nearby coincidence" evidence, the Hunan the market is the largest in central China - so what are the odds that a natural spillover events occurs immediately surrounding the largest animal market? If the disease actually emerged from WIV, what are the odds that the cases centered around the Hunan market, 8 miles away, instead of the Baishazhou live animal market, 3 miles away, or the Dijiao market, also 8 miles away?
So I agree that an update can be that strong, but this one simply isn't.
Yeah, but I think that it's more than not taken literally, it's that the exercise is fundamentally flawed when being used as an argument instead of very narrowly for honest truth-seeking, which is almost never possible in a discussion without unreasonably high levels of trust and confidence in others' epistemic reliability.
- What is the relevance of the "posterior" that you get after updating on a single claim that's being chosen, post-hoc, as the one that you want to use as an example?
- Using a weak prior biases towards thinking the information you have to update with is strong evidence. How did you decide on that particular prior? You should presumably have some reference class for your prior. (If you can't do that, you should at least have equipoise between all reasonable hypotheses being considered. Instead, you're updating "Yes Lableak" versus "No Lableak" - but in fact, "from a Bayesian perspective, you need an amount of evidence roughly equivalent to the complexity of the hypothesis just to locate the hypothesis in theory-space. It’s not a question of justifying anything to anyone.")
- How confident are you in your estimate of the bayes factor here? Do you have calibration data for roughly similar estimates you have made? Should you be adjusting for less than perfect confidence?