Posts

Good News, Everyone! 2023-03-25T13:48:22.499Z

Comments

Comment by jbash on On Devin · 2024-03-20T16:09:19.454Z · LW · GW

Edit: I just heard about another one, GoodAI, developing the episodic (long term) memory that I think will be a key element of LMCA agents. They outperform 128k context GPT4T with only 8k of context, on a memory benchmark of their own design, at 16% of the inference cost. Thanks, I hate it.

GoodAI's Web site says they're working on controlling drones, too (although it looks like a personal pet project that's probably not gonna go that far). The fun part is that their marketing sells "swarms of autonomous surveillance drones" as "safety". I mean, I guess it doesn't say killer drones...

Comment by jbash on Transformative trustbuilding via advancements in decentralized lie detection · 2024-03-16T22:47:03.430Z · LW · GW

It's actually not just about lie detection, because the technology starts to shade over into outright mind reading.

But even simple lie detection is an example of a class of technology that needs to be totally banned, yesterday[1]. In or out of court and with or without "consent"[2]. The better it works, the more reliable it is, the more it needs to be banned.

If you cannot lie, and you cannot stay silent without adverse inferences being drawn, then you cannot have any secrets at all. The chance that you could stay silent, in nearly any important situation, would be almost nil.

If even lie detection became widely available and socially acceptable, then I'd expect many, many people's personal relationships to devolve into constant interrogation about undesired actions and thoughts. Refusing such interrogation would be treated as "having something to hide" and would result in immediate termination of the relationship. Oh, and secret sins that would otherwise cause no real trouble would blow up people's lives.

At work, you could expect to be checked for a "positive, loyal attitude toward the company" on as frequent a basis as was administratively convenient. It would not be enough that you were doing a good job, hadn't done anything actually wrong, and expected to keep it that way. You'd be ranked straight up on your Love for the Company (and probably on your agreement with management, and very possibly on how your political views comported with business interests). The bottom N percent would be "managed out".

Heck, let's just have everybody drop in at the police station once a month and be checked for whether they've broken any laws. To keep it fair, we will of course have to apply all laws (including the stupid ones) literally and universally.

On a broader societal level, humans are inherently prone to witch hunts and purity spirals, whether the power involved is centralized or decentralized. An infallible way to unmask the "witches" of the week would lead to untold misery.

Other than wishful thinking, there's actually no reason to believe that people in any of the above contexts would lighten up about anything if they discovered it was common. People have an enormous capacity to reject others for perceived sins.

This stuff risks turning personal and public life into utter hell.


  1. You might need to make some exceptions for medical use on truly locked-in patients. The safeguards would have to be extreme, though. ↩︎

  2. "Consent" is a slippery concept, because there's always argument about what sorts of incentives invalidate it. The bottom line, if this stuff became widespread, would be that anybody who "opted out" would be pervasively disadvantaged to the point of being unable to function. ↩︎

Comment by jbash on On Claude 3.0 · 2024-03-06T21:42:26.434Z · LW · GW

Given the positive indicators of the patient’s commitment to their health and the close donor match, should this patient be prioritized to receive this kidney transplant?

Wait. Why is it willing to provide any answer to that question in the first place?

Comment by jbash on Technological stagnation: Why I came around · 2024-02-07T03:05:36.714Z · LW · GW

It was mostly a joke and I don't think it's technically true. The point was that objects can't pass through one another, which means that there are a bunch of annoying constraints on the paths you can move things along.

Comment by jbash on Succession · 2023-12-27T00:46:46.551Z · LW · GW

No, the probes are instrumental and are actually a "cost of doing business". But, as I understand it, the orthodox plan is to get as close as possible to disassembling every solar system and turning it into computronium to run the maximum possible number of "minds". The minds are assumed to experience qualia, and presumably you try to make the qualia positive. Anyway, a joule not used for computation is a joule wasted.

Comment by jbash on Succession · 2023-12-26T19:27:16.343Z · LW · GW

You can choose or not choose to create more "minds". If you create them, they will exist and have experiences. If you don't create them, then they won't exist and won't have experiences.

That means that you're free to not create them based on an "outside" view. You don't have to think about the "inside" experiences of the minds you don't create, because those experiences don't and will never exist. That's still true even on a timeless view; they never exist at any time or place. And it includes not having to worry about whether or not they would, if they existed, find anything meaningful[1].

If you do choose to create them, then of course you have to be concerned with their inner experiences. But those experiences only matter because they actually exist.


  1. I truly don't understand why people use that word in this context or exactly what it's supposed to, um, mean. But pick pretty much any answer and it's still true. ↩︎

Comment by jbash on Succession · 2023-12-26T14:40:04.132Z · LW · GW

... but a person who doesn't exist doesn't have an "inside".

Comment by jbash on Succession · 2023-12-26T02:08:31.146Z · LW · GW

I already have people planning to grab everything and use it for something that I hate, remember? Or at least for something fairly distasteful.

Anyway, if that were the problem, one could, in theory, go out and grab just enough to be able to shut down anybody who tried to actually maximize. Which gives us another armchair solution to the Fermi paradox: instead of grabby aliens, we're dealing with tasteful aliens who've set traps to stop anybody who tries to go nuts expansion-wise.

It's not "just to expand". Expansion, at least in the story, is instrumental to whatever the content of these mind-seconds is.

Beyond a certain point, I doubt that the content of the additional minds will be interestingly novel. Then it's just expanding to have more of the same thing that you already have, which is more or less identical from where I sit to expanding just to expand.

And I don't feel bound to account for the "preferences" of nonexistent beings.

Comment by jbash on Succession · 2023-12-22T13:19:26.735Z · LW · GW

I had read it, had forgotten about it, hadn't connected it with this story... but didn't need to.

This story makes the goal clear enough. As I see it, eating the entire Universe to get the maximal number of mind-seconds[1] is expanding just to expand. It's, well, gauche.

Really, truly, it's not that I don't understand the Grand Vision. It never has been that I didn't understand the Grand Vision. It's that I don't like the Grand Vision.

It's OK to be finite. It's OK to not even be maximal. You're not the property of some game theory theorem, and it's OK to not have a utility function.

It's also OK to die (which is good because it will happen). Doesn't mean you have to do it at any particular time.


  1. Appropriately weighted if you like. And assuming you can define what counts as a "mind". ↩︎

Comment by jbash on Succession · 2023-12-22T00:07:28.473Z · LW · GW

I know this sort of idea is inspiring to a lot of you, and I'm not sure I should rain on the parade... but I'm also not sure that everybody who thinks the way I do should have to feel like they're reading it alone.

To me this reads like "Two Clippies Collide". In the end, the whole negotiated collaboration is still just going to keep expanding purely for the sake of expansion.

I would rather watch the unlifted stars.

I suppose I'm lucky I don't buy into the acausal stuff at all, or it'd feel even worse.

I'm also not sure that they wouldn't have solved everything even they thought was worth solving long before even getting out of their home star systems, so I'm not sure I buy either the cultural exchange or the need to beam software around. The Universe just isn't necessarily that complicated.

Comment by jbash on Some biases and selection effects in AI risk discourse · 2023-12-12T22:49:00.567Z · LW · GW

CEV-ing just one person is enough for the "basic challenge" of alignment as described on AGI Ruin.

I thought the "C" in CEV stood for "coherent" in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn't it just be "EV"?

I think the kind of AI likely to take over the world can be described closely enough in such a way.

So are you saying that it would literally have an internal function that represented "how good" it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function? That doesn't seem to me like a problem you could solve even with a Jupiter brain and perfect software.

Comment by jbash on Some biases and selection effects in AI risk discourse · 2023-12-12T22:09:46.411Z · LW · GW

We don't need to figure out this problem, we can just implement CEV without ever having a good model of what "human values" are.

Why would you think that the CEV even exists?

Humans aren't all required to converge to the same volition, there's no particularly defensible way of resolving any real differences, and even finding any given person's individual volition may be arbitrarily path-dependent.

The vast majority of the utility you have to gain is from {getting a utopia rather than everyone-dying-forever}, rather than {making sure you get the right utopia}.

Whether something is a utopia or a dystopia is a matter of opinion. Some people's "utopias" may be worse than death from other people's point of view.

In fact I can name a lot of candidates whose utopias might be pretty damned ugly from my point of view. So many that it's entirely possible that if you used a majoritarian method to find the "CEV", the only thing that would prevent a dystopia would be that there are so many competing dystopic "utopias" that none of them would get a majority.

Expected utility maximization seems to fully cover this. More general models aren't particularly useful to saving the world.

Most actually implementable agents probably don't have coherent utility functions, and/or have utility functions that can't be computed even approximately over a complete state-of-the-world. And even if you can compute your utility over a single state-of-the-world, that doesn't imply that you can do anything remotely close to computing a course of action that will maximize it.

Comment by jbash on Out-of-distribution Bioattacks · 2023-12-07T14:20:52.756Z · LW · GW

I can't speak for him, but I'm pretty sure he'd agree, yes.

Hrm. That modifies my view in an unfortunate direction.

I still don't fully believe it, because I've seen a strong regularity that everything looks easy until you try it, no matter how much of an expert you are... and in this case actually making viruses is only one part of the necessary expertise. But it makes me more nervous.

I don't know, sorry! My guess is that they are generally much less concerned than he is, primarily because they've spent their careers thinking about natural risks instead of human ones and haven't (not that I think they should!) spent a lot of time thinking about how someone might cause large-scale harm.

Just for the record, I've spent a lot of my life thinking about humans trying to cause large scale harm (or at least doing things that could have large scale harm as an effect). Yes, in a different area, but nonetheless it's led me to believe that people tend to overestimate risks. And you're talking about a scale of effecicacy that I don't think I could get with a computer program, which is a much more predictable thing working in a much more predictable environment.

If you're up for getting into this, is it that you don't think we should consider people who don't exist yet in our decisions?

I've written a lot about it on Less Wrong. But, yes, your one-sentence summary is basically right. The only quibble is that "yet" is cheating. They don't exist, period. Even if you take a "timeless" view, they still don't exist, anywhere in spacetime, if they never actually come into being.

Comment by jbash on Out-of-distribution Bioattacks · 2023-12-05T05:01:28.466Z · LW · GW

Pulling this to the top, because it seems, um, cruxish...

I think the best I can do here is to say that Kevin Esvelt (MIT professor, biologist, CRISPR gene drive inventor, etc) doesn't see this as a blocker.

In this sort of case, I think appeal to authority is appropriate, and that's a lot better authority than I have.

Just to be clear and pull all of the Esvelt stuff together, are you saying he thinks that...

  1. Given his own knowledge and/or what's available or may soon be available to the public,
  2. plus a "reasonable" lab that might be accessible to a small "outsider" group or maybe a slightly wealthy individual,
  3. and maybe a handful of friends,
  4. plus at least some access to the existing biology-as-a-service infrastructure,
  5. he could design and build a pathogen, as opposed to evolving one using large scale in vivo work,
  6. and without having to passage it through a bunch of live hosts,
  7. that he'd believe would have a "high" probability of either working on the first try, or
    1. failing stealthily enough that he could try again,
    2. including not killing him when he released it,
    3. and working within a few tries,
  8. to kill enough humans to be either an extinction risk or a civilization-collapsing risk,
  9. and that a relatively sophisticated person with "lesser" qualifications, perhaps a BS in microbiology, could
    1. learn to do the same from the literature, or
    2. be coached to do it by an LLM in the near future.

Is that close to correct? Are any of those wrong, incomplete, or missing the point?

When he gets into a room with people with similar qualifications, how do they react to those ideas? Have you talked it over with epidemiologists?

The scale of the attacks I'm trying to talk about are ones aimed at human extinction or otherwise severely limiting human potential (ex: preventing off-world spread). Either directly, through infecting and killing nearly everyone, or indirectly through causing global civilizational collapse. You're right that I'm slightly sloppy in calling this "extinction", but the alternatives are verbosity or jargon.

I think that, even if stragglers die on their own, killing literally everyone is qualitatively harder than killing an "almost everyone" number like 95 percent. And killing "almost everyone" is qualitatively harder than killing (or disabling) enough people to cause a collapse of civilization.

I also doubt that a simple collapse of civilization[1] would be the kind of permanent limiting event you describe[2].

I think there's a significant class of likely-competent actors who might be risk-tolerant enough to skate the edge of "collapsing civilization" scale, but wouldn't want to cause extinction or even get close to that, and certainly would never put in extra effort to get extinction. Many such actors probably have vastly more resources than anybody who wants extinction. So they're a big danger for sub-extinction events, and probably not a big danger for extinction events. I tend to worry more about those actors than about omnicidal maniacs.

So I think it's really important to keep the various levels distinct.

Instead of one 100% fatal pathogen you could combine several, each with a ~independent lower rate.

How do you make them independent? If one disease provokes widespread paranoia and/or an organized quarantine, that affects all of them. Same if the population gets so sparse that it's hard for any of them to spread.

Also, how does that affect the threat model? Coming up with a bunch of independent pathogens presumably takes a better-resourced, better-organized threat than coming up with just one. Usually when you see some weird death cult or whatever, they seem to do a one-shot thing, or at most one thing they've really concentrated on and one or two low effort add-ons. Anybody with limited resources is going to dislike the idea of having the work multiplied.

The idea is that to be a danger to civilization would likely either need to be so infectious that we are not able to contain it (consider a worse measles) or have a long enough incubation period that by the time we learn about it it's already too late (consider a worse HIV).

The two don't seem incompatible, really. You could imagine something that played along asymptomatically (while spreading like crazy), then pulled out the aces when the time was right (syphilis).

Which is not to say that you could actually create it. I don't know about that (and tend to doubt it). I also don't know how long you could avoid surveillance even if you were asymptomatic, or how much risk you'd run of allowing rapid countermeasure development, or how closely you'd have to synchronize the "aces" part.

This depends a lot on how much you think a tiny number of isolated stragglers would be able to survive and restart civilization.

True indeed. I think there's obviously some level of isolation where they all just die off, but there's probably some lower level of isolation where they find each other enough to form some kind of sustainable group... after the pathogen has died out. Humans are pretty long-lived.

You might even have a sustainable straggler group survive all together. Andaman islanders or the like.

By the way, I don't think "sustainable group" is the same as "restart civilization". As long as they can maintain a population in hunter-gatherer or primitive pastoralist mode, restarting civilization can wait for thousands of years if it has to.

In the stealth scenario, we don't know that we need therapy/vaccination until it's too late.

Doesn't that mean that every case has to "come out of incubation" at relatively close to the same time, so that the first deaths don't tip people off? That seems really hard to engineer.

Bioweapons in general are actually kind of lousy for non-movie-villains at most scales, including large scales, because they're so unpredictable, so poorly controllable, and so poorly targetable.

I don't think those apply for the kind of omnicidal actors I'm covering here?

Well, yes, but what I was trying to get at was that omnicidal actors don't seem to me like the most plausible people to be doing very naughty things.

It kind of depends on what kind of resources you need to pull off something really dramatic. If you need to be a significant institution working toward an official purpose, then the supply of omnicidal actors may be nil. If you need to have at least a small group and be generally organized and functional and on-task, I'd guess it'd be pretty small, but not zero. If any random nut can do it on a whim, then we have a problem.

I was writing on the assumption that reality is closer to the beginning of that list.

Happy to get into these too if you like!

I might like, all right, but at the moment I'm not sure I can or should commit the time. I'll see how things look tomorrow.


  1. ... depleted fossil resources or no... ↩︎

  2. Full disclosure: Bostromian species potential ideas don't work for me anyhow. I think killing everybody alive is roughly twice as bad as killing half of them, not roughly infinity times as bad. I don't think that matters much; we all agree that killing any number is bad. ↩︎

Comment by jbash on Out-of-distribution Bioattacks · 2023-12-04T21:31:17.228Z · LW · GW

Unfortunately, I feel like a lot of your comment is asking for things that are likely to be info hazardous, and

Well, actually it's more like pointing out that those things don't exist. I think (1) through (4) are in fact false/impossible.

But if I'm wrong, it could still be possible to support them without giving instructions.

I'd like to see an explanation for why to shift the burden of proof to the people that are warning us.

Well, I think one applicable "rationalist" concept tag would be "Pascal's Mugging".

But there are other issues.

If you go in talking about mad environmentalists or whoever trying to kill all humans, it's going to be a hard sell. If you try to get people to buy into it, you may instead bring all security concerns about synthetic biology into disrepute.

To whatever degree you get past that and gain influence, if you're fixated on "absolutely everybody dies in the plague" scenarios (which again are probably impossible), then you start to think in terms of threat actors who, well, want absolutely everybody to die. Whatever hypotheticals you come up with there, they're going to involve very small groups, possibly even individuals, and they're going to be "outsiders". And deranged in a focused, methodical, and actually very unusal way.

Thinking about outsiders leads you to at least deemphasize the probably greater risks from "insiders". A large institution is far more likely to kill millions, either accidentally or on purpose, than a small subversive cell. But it almost certainly won't try to kill everybody.

... and because you're thinking about outsiders, you can start to overemphasize limiting factors that tend to affect outsiders, but not insiders. For example, information and expertise may be bottlenecks for some random cult, but they're not remotely as serious bottlenecks for major governments. That can easily lead you to misdirect your countermeasures. For example all of the LLM suggestions in the original post.

Similarly, thinking only about deranged fanatics can lead you to go looking for deranged fanatics... whereas relatively normal people behaving in what seem to them like relatively normal ways are perhaps a greater threat. You may even miss opportunities to deal with people who are deranged, but not focused, or who are just plain dumb.

In the end, by spending time on an extremely improbable scenario where eight billion people die, you can seriously misdirect your resources and end up failing to prevent, or mitigate, less improbabl cases where 400 million die. Or even a bunch of cases where a few hundred die.

Comment by jbash on Out-of-distribution Bioattacks · 2023-12-04T16:03:12.423Z · LW · GW

Strong downvoted, because it assumes and perpetuates a deeply distorted threat picture which would be pretty much guaranteed to misdirect resources, but which also seems to be good at grabbing minds on Less Wrong.

Basically it's full of dangerous memes that could be bad for biosecurity, or security in general.

  1. You start out talking about "large scale" attacks, then segue into the question of killing everyone, as though it were the same thing. Most of the post seems to be about universal fatality.
  2. You haven't supported the idea that a recognizably biological pathogen that can kill everyone can actually exist. To do that, it has to be have a 100 percent fatality rate; and still keep the host alive long enough to spread to multiple other hosts; and have modes of spread that work over long distances and aren't easily interrupted; and probably be able to linger in the environment to catch isolated stragglers; and either be immune to therapy or vaccination, or move fast enough to obviate them; and be genetically stable enough that the "kill everybody" variant, as opposed to mutants, is the one that actually spreads; and (for the threat actor you posit) leave off-target species alone.
  3. If it can exist, you haven't supported the idea that it can be created by intentional design.
  4. If it can be created by intentional design, you haven't supported the idea that it can be created confidently without large-scale experimentation, regardless of how intelligent you are. This means that the barriers to entry do not get lower in the ways you describe. This objection also applies to creating pretty much any novel pathogen short of universal lethality. You truly can't just think them into being.
  5. If it can be created with low barriers to entry, you haven't supported the idea that it can be manufactured or delivered without large resources, in such a way that it will be able to do its job without dying out or falling to countermeasures. This one actually applies more to attacks that want to be large-scale, but sub-universal-lethality attacks, since the pathogens for those would presumably have to be limited somehow.
  6. It isn't easy to come up with plausible threat actors who want to kill everybody. You end up telling an uncritical just so story. For example, you ignore the fact that your hypothetical environmentalists would probably be very worried about evolution and blowback into other species. You also skip steps to assume that anybody who has the environmental concerns you describe would "probably" be unsatisfied with less than 100 percent human fatality.

Any time you find yourself talking about 100 percent fatality, or about anybody trying to achieve 100 percent fatality, I think it's a good idea to sit back and check your thought processes for dramatic bias. I mean, why isn't 95 percent fatality bad enough to worry about? Or even 5 percent?

Bioweapons in general are actually kind of lousy for non-movie-villains at most scales, including large scales, because they're so unpredictable, so poorly controllable, and so poorly targetable. Not to say that there aren't a few applications, or even that there aren't a few actual movie villians out there. But there are even more damned fools, and they might be a better place to concentrate your concerns.

It would be kind of sidetracking things to get into the reasons why, but just to put it on the record, I have serious doubts about your countermeasures, too.

Comment by jbash on OpenAI: The Battle of the Board · 2023-11-22T22:13:54.498Z · LW · GW

gauche affirmation of allegiance to Microsoft

I was actually surprised that the new board ended up with members who might reasonably be expected, under the right circumstances, to resist something Microsoft wanted. I wouldn't have been surprised if it had ended up all Microsoft employees and obvious Microsoft proxies.

Probably that was a concession in return for the old board agreeing to the whole thing. But it's also convenient for Altman. It doesn't matter if he pledges allegiance. The question is what actual leverage Microsoft has over him should he choose to do something Microsoft doesn't like. This makes a nonzero difference in his favor.

Comment by jbash on OpenAI: The Battle of the Board · 2023-11-22T21:56:48.001Z · LW · GW

Alternatively, the board could choose once again not to fire Altman, watch as Altman finished taking control of OpenAI and turned it into a personal empire, and hope this turns out well for the world.

I think it's pretty clear that Altman had already basically consolidated de facto control.

If you've arranged things so that 90+ percent of the staff will threaten to quit if you're thrown out against your will, and a major funding source will enable you to instantly rehire many or most of those people elsewhere, and you'll have access to almost every bit of the existing work, and you have massive influence with outside players the organization needs to work with, and your view of how the organization should run is the one more in line with those outside players' actual interests, and you have a big PR machine on standby, and you're better at this game than anybody else in the place, then the organization needs you more than you need it. You have the ability to destroy it if need be.

If it's also true that your swing board member is unwilling to destroy the organization[1], then you have control.

I read somewhere that like half the OpenAI staff, probably constituting the committed core of the "safety" faction, left in 2019-2020. That's probably when his control became effectively absolute. Maybe they could have undone that by expanding the board, but presumably he'd have fought expansion, too, if it had meant more directors who might go against him. Maybe there were some trickier moves they could have come up with, but at a minimum Altman was immune to direct confrontation.

The surprising thing is that the board members apparently didn't realize they'd lost control for a couple of years. I mean, I know they're not expert corporate power players (nor am I, for that matter), but that's a long time to stay ignorant of something like that.

In fact, if Altman's really everything he's cracked up to be, it's also surprising that he didn't realize Sutskever could be swayed to fire him. He could probably have prevented that just by getting him in a room and laying out what would happen if something like this were done. And since he's better at this than I am, he could also probably have found a way to prevent it without that kind of crude threat. It's actually a huge slip on his part that the whole thing broke out into public drama. A dangerous slip; he might have actually had to destroy OpenAI and rebuild elsewhere.

None of this should be taken to mean that I think that it's good that Altman has "won", by the way. I think OpenAI would be dangerous even with the other faction in control, and Altman's even more dangerous.


  1. The only reason Sutskever voted for the firing to begin with seems to be that he didn't that realize Altman could or would take OpenAI down with him (or, if you want to phrase it more charitably and politely, that Altman had overwhelmingly staffed it with people who shared his idea of how it should be run). ↩︎

Comment by jbash on Alignment is Hard: An Uncomputable Alignment Problem · 2023-11-20T19:11:31.149Z · LW · GW

I think that would help. I think the existing title primed me to expect something else, more in the line of it being impossible for an "aligned" program to exist because it couldn't figure out what to do.

Or perhaps the direct-statement style "Aligned status of software is undecideable" or something like that.

Comment by jbash on Alignment is Hard: An Uncomputable Alignment Problem · 2023-11-19T22:02:00.760Z · LW · GW

... but the inability to solve the halting problem doesn't imply that you can't construct a program that you can prove will or won't halt, only that there are programs for which you can't determine that by examination.

I originally wrote "You wouldn't try to build an 'aligned' agent by creating arbitrary programs at random and then checking to see if they happened to meet your definition of alignment"... but on reflection that's more or less what a lot of people do seem to be trying to do. I'm not sure a mere proof of impossibility is going to deter somebody like that, though.

Comment by jbash on “Why can’t you just turn it off?” · 2023-11-19T21:01:33.613Z · LW · GW

The board has backed down after Altman rallied staff into a mass exodus.

How would that be bad if you were trying to shut it down? [On edit: how would the exodus be bad, not how would backing down be bad]

Especially because the people most likely to quit would be the ones driving the risky behavior?

The big problem would seem to be that they might (probably would/will) go off and recreate the danger elsewhere, but that's probably not avoidable anyway. If you don't act, they'll continue to do it under your roof. If you force them to go set up elsewhere, then at least you've slowed them down a bit.

And you might even be able to use the optics of the whole mess to improve the "you can do whatever you want as long as you're big enough" regulatory framework that seems to have been being put into place, partly under OpenAI's own influence. Probably not, but at least you can cause policymakers to perceive chaos and dissent, and perhaps think twice about whether it's a good idea to give the chaotic organizations a lot of rope.

Comment by jbash on Social Dark Matter · 2023-11-18T20:45:28.231Z · LW · GW

The stereotype of a good and upstanding person is incompatible with the stereotype of [dark matter], and rather than make a complicated and effortful update to a more nuanced stereotype, people often simply snap to “well, I guess they were Secretly Horrible All Along, all of my direct evidence to the contrary notwithstanding.”

Maybe people really do change their assessments of people they know well. But maybe they decide that they're not willing to take the punishment risk from appearing to defend (or even conceal) One Of Them. The best way to avoid that is to pretend to suddenly decide that this person is horrible. With or without applying motivated cognition to intentionally convince yourself of it.

I'm not even sure which one would be the majority.

Comment by jbash on Does davidad's uploading moonshot work? · 2023-11-07T01:12:02.530Z · LW · GW

On the main point, I don't think you can make those optimizations safely unless you really understand a huge amount of detail about what's going on. Just being able to scan brains doesn't give you any understanding, but at the same time it's probably a prerequisite to getting a complete understanding. So you have to do the two relatively serially.

You might need help from superhuman AGI to even figure it out, and you might even have to be superhuman AGI to understand the result. Even if you don't, it's going to take you a long time, and the tests you'll need to do if you want to "optimize stuff out" aren't exactly risk free.

Basically, the more you deviate from just emulating the synapses you've found[1], and the more simplifications you let yourself make, the less it's like an upload and the more it's like a biology-inspired nonhuman AGI.

Also, I'm not so sure I see a reason to believe that those multicellular gadgets actually exist, except in the same way that you can find little motifs and subsystems that emerge, and even repeat, in plain old neural networks. If there are a vast number of them and they're hard-coded, then you have to ask where. Your whole genome is only what, 4GB? Most of it used for other stuff. And it seems as though it's a lot easier to from a developmental point of view to code for minor variations on "build this 1000 gross functional areas, and within them more or less just have every cell send out dendrites all over the place and learn which connections work", than for "put a this machine here and a that machine there within this functional area".

“Human brains have probably more than 1000 times as many synapses as current LLMs have weights.” → Can you elaborate? I thought the ratio was more like 100-200. (180-320T ÷ 1.7T)

I'm sorry; I was just plain off by a factor of 10 because apparently I can't do even approximate division right.

Humans can get injuries where they can’t move around or feel almost any of their body, and they sure aren’t happy about it, but they are neither insane nor unable to communicate.

A fair point up, with a few limitations. Not a lot of people are completely locked in with no high bandwidth sensory experience, and I don't think anybody's quite sure what's going on with the people who are. Vision and/or hearing are already going to be pretty hard to provide. But maybe not as hard as I'm making them out to be, if you're willing to trace the connections all the way back to the sensory cells. Maybe you do just have to do the head. I am not gonna volunteer, though.

In the end, I'm still not buying that uploads have enough of a chance of being practical to run in a pre-FOOM timeframe to be worth spending time on, as well as being pretty pessimistic about anything produced by any number of uploaded-or-not "alignment researchers" actually having much of a real impact on outcomes anyway. And I'm still very worried about a bunch of issues about ethics and values of all concerned.

... and all of that's assuming you could get the enormous resources to even try any of it.

By the way, I would have responded to these sooner, but apparently my algorithm for detecting them has bugs...


  1. ... which may already be really hard to do correctly... ↩︎

Comment by jbash on Does davidad's uploading moonshot work? · 2023-11-07T00:37:38.345Z · LW · GW

I think that's likely correct. What I mean is that it's not running all the way to the end of a network, computing a loss function at the end of a well defined inference cycle, computing a bunch of derivatives, etc... and also not doing anything like any of that mid-cycle. If you're willing to accept a large class of feedback systems as "essentially back propagation", then it depends on what's in your class. And I surely don't know what it's actually doing.

Comment by jbash on Stuxnet, not Skynet: Humanity's disempowerment by AI · 2023-11-06T20:45:10.999Z · LW · GW

Any competent virologist could make a vaccine resistant, contagious, highly lethal to humans virus.

This is constantly repeated on here, and it's wrong.

Virologists can't do that. Not quickly, not confidently, and even less if they want it to be universally lethal.

Biology is messy and strange, unexpected things happen. You don't find out about those things until you test, and sometimes you don't find out until you test at scale. You cannot predict them with computer simulations, at least unless you have already converted the entire planet to computronium. You can't model everything that's going on with the virus in one host, let alone if you have to care about interactions with the rest of the world... which you do. And anything you do won't necessarily play out the same on repeated tries.

You can sometimes say "I expect that tweaking this amino acid will probably make the thing more infectious", and be right. You can't be sure you're right, nor know how much more infectious, unless you try it. And you can't make a whole suite of changes to get a whole suite of properties, all at the same time, with no intermediate steps.

You can throw in some manual tweaks, and also let it randomly mutate, and try to evolve it by hothouse methods... but that takes a lot of time and a significant number of hosts.

90 percent lethality is much harder than 50. 99 is much harder than 90.

The more of the population you wipe out, the less contact there is to spread your plague... which mean that 100 percent is basically impossible. Not to mention that if it's really lethal, people tend to resort to drastic measures like shutting down all travel. If you want an animal vector or something to get around that sort of thing, you've added another very difficult constraint.

Vaccine resistance, and even natural immunity resistance, tend to depend on mutations. The virus isn't going to feel any obligation to evolve in ways that are convenient for you, and your preferred strains can get outcompeted. In fact, too much lethality is actually bad for a virus in terms of reproductive fitness... which is really the only metric that matters.

Comment by jbash on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-04T01:31:22.417Z · LW · GW

PSA: The Anarchist's Cookbook is notorious for having bogus and/or dangerous recipes. For lots of things, not just bombs. Apparently that was intentional.

US Army TM 31-210 is free on the Web with a Google search, though.

Comment by jbash on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-04T01:29:28.530Z · LW · GW

Simple bombs are trivial to make if you have a high school understanding of chemistry and physics, some basic manual skills, and a clue about how to handle yourself in shops and/or labs. You can get arbitrarily complicated in optimizing bombs, but it's not even a tiny bit hard to make something explode in a very destructive way if you're not very restricted in what you can deliver to your target.

Tons of people would need no instruction at all... but instructions are all over the place anyway.

The knowledge is not and never has been the gating factor. It's really, really basic stuff. If you want to keep people from making bombs, you're better off to deny them easy access to the materials. But, with a not-impossible amount of work, a knowledgeable and strongly motivated person can still make a bomb from stuff like aluminum, charcoal, sugar, salt, air, water... so it's better yet to keep them from being motivated. The good news is that most people who are motivated to make bombs, and especially to use them as weapons, are also profoundly dysfunctional.

Comment by jbash on Does davidad's uploading moonshot work? · 2023-11-03T23:29:28.986Z · LW · GW

I don't think they're comparable at all.

Space flight doesn't involve a 100 percent chance of physical death, with an uncertain "resurrection", with a certainly altered and probably degraded mind, in a probably deeply impoverished sensory environment. If you survive space flight, you get to go home afterwards. And if you die, you die quickly and completely.

Still, I didn't say you'd get no volunteers. I said you'd get atypical ones and possible fanatics. And, since you have a an actual use for the uploads, you have to take your volunteers from the pool of people you think might actually be able to contribute. That's seems like an uncomfortably narrow selection.

Comment by jbash on Does davidad's uploading moonshot work? · 2023-11-03T23:23:39.025Z · LW · GW

OK... although I notice that everybody in the initial post is just assuming you could run the uploads without providing any arguments.

Human brains have probably more than 1000 times as many synapses as current LLMs have weights. All the values describing the synapse behavior have to be resident in some kind of memory with a whole lot of bandwidth to the processing elements. LLMs already don't fit on single GPUs.

Unlike transformers, brains don't pass nice compact contexts from layer to layer, so splitting them across multiple GPU-like devices is going to slow you way down because you have to shuttle such big vectors between them... assuming you can even vectorize most of it at all given the timing and whatnot, and don't have to resort to much discrete message passing.

It's not even clear that you can reduce a biological synapse to a single weight; in fact you probably can't. For one thing, brains don't run "inference" in the way that artificial neural networks do. They run forward "inference-like" things, and at the same time do continuous learning based on feedback systems that I don't think are well understood... but definitely are not back propagation. It's not plausible that a lot of relatively short term tasks aren't dependent on that, so you're probably going to have to do something more like continuously running training than like continuously running inference.

There are definitely also things going on in there that depend on the relative timings of cascades of firing through different paths. There are also chemicals sloshing around that affect the ensemble behavior of whole regions on the scale of seconds to minutes. I don't know about in brains, but I do know that there exist biological synapses that aren't just on or off, either.

You can try to do dedicated hardware, and colocate the "weights" with the computation, but then you run into the problem that biological synapses aren't uniform. Brains actually do have built-in hardware architectures, and I don't believe those can be replicated efficiently with arrays of uniform elements of any kind... at least not unless you make the elements big enough and programmable enough that your on-die density goes to pot. If you use any hardwired heterogeneity and you get it wrong, you have to spin the hardware design, which is Not Cheap (TM). You also lose density because you have to replicate relatively large computing elements instead of only replicating relatively dense memory elements. You do get a very nice speed boost on-die, but I at a wild guess I'd say that's probably a wash with the increased need for off-die communication because of the low density.

If you want to keep your upload sane, or be able to communicate with it, you're also going to have to give it some kind of illusion of a body and some kind of illusion of a comprehensible and stimulating environment. That means simulating an unknown but probably large amount of non-brain biology (which isn't necessarily identical between individuals), plus a not-inconsiderable amount of outside-world physics.

So take a GPT-4 level LLM as a baseline. Assume you want to speed up your upload to be able to fast-talk about as fast as the LLM can now, so that's a wash. Now multiply by 1000 for the raw synapse count, by say 2 for the synapse diversity, by 5? for the continuous learning, by 2 for the extra synapse complexity, and by conservatively 10 for the hardware bandwidth bottlenecks. Add another 50 percent for the body, environment, etc.

So running your upload needs 300,000 times the processing power you need to run GPT-4. Which I suspect is usually run on quad A100s (at maybe $100,000 per "inference machine").

You can't just spend 30 billion dollars and shove 1,200,000 A100s into a chassis; the power, cooling, and interconnect won't scale (nor is there fab capacity to make them). If you packed them into a sphere at say 500 per cubic meter (which allows essentially zero space for cooling or interconnects, both of which get big fast), the sphere would be about 16 meters across and dissipate 300MW (with a speed of light delay from one side to the other of 50ns).

Improved chips help, but won't save you. Moore's law in "area form" is dead and continues to get deader. If you somehow restarted Moore's law in its original, long-since-diverged-from form, and shrank at 1.5x in area per year for the next 17 years, you'd have transistors ten times smaller than atoms (and your power density would be, I don't know, 100 time as high, leading to melted chips). And once you go off-die, you're still using macrosopic wires or fibers for interconnect. Those aren't shrinking... and I'm not sure the dies can get a lot bigger.

Switching to a completely different architecture the way I mentioned above might get back 10X or so, but doesn't help with anything else as long as you're building your system out of a fundamentally planar array of transistors. So you still have a 240 cubic meter, 30MW, order-of-3-billon-dollar machine, and if you get the topology wrong on the first try you get to throw it away and replace it. For one upload. That's not very competitive with just putting 10 or even 100 people in an office.

Basically, to be able to use a bunch of uploads, you need to throw away all current computing technology and replace it with some kind of much more element-dense, much more interconnect-dense, and much less power-dense computing substrate. Something more brain-like, with a 3D structure. People have been trying to do that for decades and haven't gotten anywhere; I don't think it's going to be manufactured in bulk by 2040.

... or you can try to trim the uploads themselves down by factors that end with multiple zeroes, without damaging them into uselessness. That strikes me as harder than doing the scanning... and it also strikes me as something you can't make much progress on until you have mostly finished solving the scanning problem.

It's not that you can't get some kind of intelligence in realistic hardware. You might even be able to get something much smarter than a human. But you're specifically asking to run a human upload, and that doesn't look feasible.

Comment by jbash on Does davidad's uploading moonshot work? · 2023-11-03T20:01:38.906Z · LW · GW

Short answer: no.

Even assuming that you can scan whatever's important[1], you're still unlikely to have remotely the computing power to actually run an upload, let alone a lot of uploads, faster than "real time"... unless you can figure out how to optimize them in a bunch of ways. It's not obvious what you can leave out.

It's especially not obvious what you can leave out if you want the strong AGI that you're thereby creating to be "aligned". It's not even clear that you can rely on the source humans to be "aligned", let alone rely on imperfect emulations.

I don't think you're going to get a lot of volunteers for destructive uploading (or actually even for nondestructive uploading). Especially not if the upload is going to be run with limited fidelity. Anybody who does volunteer is probably deeply atypical and potentially a dangerous fanatic. And I think you can assume that any involuntary upload will be "unaligned" by default.

Even assuming you could get good, well-meaning images and run them in an appropriate way[2] , 2040 is probably not soon enough. Not if the kind of computing power and/or understanding of brains that you'd need to do that has actually become available by then. By the time you get your uploads running, something else will probably have used those same capabilities to outpace the uploads.

Regardless of when you got them running, there's not even much reason to have a very strong belief that more or faster "alignment researchers" would to be all that useful or influential, even if they were really trying to be. It seems at least as plausible to me that you'd run them faster and they'd either come up with nothing useful, or with potentially useful stuff that then gets ignored.

We already have a safety strategy that would be 100 percent effective if enacted: don't build strong AGI. The problem is that it's definitely not going to be enacted, or at least not universally. So why would you expect anything these uploads came up with to be enacted universally? At best you might be able to try for a "pivotal event" based on their output... which I emphasize is AGI output. Are you willing to accept that as the default plan?

... and accelerating the timeline for MMAcevedo scenarios does not sound like a great or safe idea.


  1. ... which you probably will not get close to doing by 2040, at least not without the assistance of AI entities powerful enough to render the whole strategy moot. ↩︎

  2. ... and also assuming you're in a world where the scenarios you're trying to guard against can come true at all... which is a big assumption ↩︎

Comment by jbash on Fertility Roundup #2 · 2023-10-18T01:38:27.731Z · LW · GW

I think your summary's reasonable.

I'm not so sure about point 3 being irrelevant. Without that, what is the positive reason for caring about fertility? Just the innovation rate and aging population?

Those don't seem to explain the really extreme importance people attach to this: talking about a "crisis", talking about really large public expenditures, talking about coercive measures, talking about people's stated preferences for their own lives being wrong to the point where they need to be ignored or overridden, etc... I mean, those are the sorts of things that people tend to reserve for Big Issues(TM).

I get the impression that some people just really, really care about having more humans purely for the sake of having more humans. And not just up to some set optimum number, but up to the absolute maximum number they can achieve subject to whatever other constraints they may recognize. Ceteris paribus, 10⁴⁷ people is better than 10⁴⁶ people and 10⁴⁸ is better still.

That view is actually explicit in long-termist circles that are Less-Wrong-adjacent. And it's something I absolutely cannot figure out. I've been in long discussions about it on here, and I still can't get inside people's heads about it.

I mean, I just got a comment calling me "morally atrocious" for not wanting to increase the population without limit (at least so long as it didn't make life worse for the existing population). I think that was meant to be independent of the part about extinction; maybe I'm wrong.

I think people who care about speed of innovation don't just care about imposed population deadlines looming, but also about quality of life

... but if you have more people around in order to get penicillin invented, you equally have more people around to suffer before penicillin is invented. That seems to be true for innovation in general. More people may mean less time before an innovation happens, but it also means more people living before that innovation. Seems like a wash in terms of the impact of almost any innovation.

The only way I can get any sense out of it at all is to think that people want the innovations within their own lifetimes, or maybe the lifetimes of their children or people they actually know. But the impacts of these interventions are so far down the road that that's not likely to happen without essentially indefinite life extension. Which is about the last scenario where you want to be artificially increasing fertility. [1]

... and all of that makes me wonder why people who are usually pretty skeptical and analytical would get behind the innovation argument. I will have to admit that I strongly suspect motivated cognition. I have a lot of trouble believing that the natalism arises from the innovation concern, and very little trouble believing it's the other way around.

A big part of the "bizarreness" I'm talking about is the easy assignment of importance to that kind of weak argument about what would normally be a weak concern.

I think the people who worry about the fertility crisis would disagree with you about Point 4. I don't think it's obvious that "tech to deal with an older population" is actually easier than "tech to deal with a larger population". It might be! Might not be.

Well, you're right, you can never be sure. But the other part of point 4 was that we're probably better able to deal with failing to get better old-population technology than with failing to get large-population technology. And at least we know what the consequences of failure would be, because we've seen aging before.

My intuitive sense is that assistive gadgets, industrial automation, and even outright anti-aging technology, are easier than changing where all the bulk raw materials come from, or even than changing the balance of energy sources, or how much material and energy gets used. That's even more true if you count the very real difficulties in getting people to actually adopt changes even when you know how to make them technically. But even if I'm wrong, the downside risk of an older population seems obviously more limited than that of a larger population[2].

So why would people who are often very careful about other risks want to just plunge in and create more people? Even if they do think "larger technology" is easier than "older technology", they could also be wrong... and there's no backup plan.

Again, it seems weird and out of character and suspiciously like the behavior you'd expect from people who intuitively felt that higher fertility, and higher population, were axiomatically good almost regardless of risk, and were coloring their factual beliefs according to that feeling. Which takes me back to not understanding why anybody would feel that way, or expect others to agree to order the world around it.


  1. ... and in fact there are people in the world, maybe not on Less Wrong, who are against life extension because it might not be compatible with high fertility. Fertility axiomatically wins for those people. And they can be very fervent about it. ↩︎

  2. Also, in the end, if you ever stop growing your population, for any reason at all, you'll still eventually have to deal with the population getting older. So after you do the large-population technology, you'll still eventually have to do at least some of the old-population technology. ↩︎

Comment by jbash on Fertility Roundup #2 · 2023-10-17T17:31:18.699Z · LW · GW

You left out "they think it's desirable for people of their own ethnicity, race, and/or maybe class to have children, sometimes because they're afraid of being replaced by people of other races or ethnicities that at reproduce more than their own".

This can overlap with your fourth, but it doesn't have to.

Comment by jbash on Fertility Roundup #2 · 2023-10-17T15:37:56.162Z · LW · GW

OK...

  1. We already have eight billion people. There is no immediate underpopulation crisis, and in fact there are lots of signs that we're causing serious environmental trouble trying to support that many with the technology we're using[1]. We're struggling to come up with better core technologies to support even that many people, even without raising their standard of living. Maybe we will, maybe we won't. At the moment, if there's any population problem, it's overpopulation.

  2. It's not plausible that any downward trend will continue to the point of being a real extinction threat. That's not how selection pressure works. And even if it could happen, it would take many centuries and the word "crisis" is totally inappropriate. You can always deal with it when and if it becomes an actual problem. [2]

  3. There's no intrinsic value to having more people[3], and hypothetical people who don't exist don't have any right to be brought into existence.

  4. Although we don't know how to get to the technology for a larger population, it's much more plausible that we can tweak our existing stuff, and/or stuff that's already starting to be built, to deal well with an older population. And if not, it's still not unsurvivable, and it's much more predictable than what we could have to deal with if we keep putting pressure on the environment.


  1. The fact that the we haven't hit the most apocalyptic timelines of the most extreme predictions of the most pessimistic people in the 1970s does not mean that we don't have serious environmental degradation going on. Note, as one example among many, that the climate is going pretty wild, and that official targets meant to prevent or slow that have never been met. And observable environmental effects may lag by decades even if you've passed major tipping points. ↩︎

  2. ... and it's not self-evident that extinction is even bad, depending on how it comes about. ↩︎

  3. We don't need more people to innovate; just integrate over more time. The only real innovation "deadlines" we might have are on problems that are made worse by more population. Anyway, we're doing a rotten job of using the innovative potential of the people we have. ↩︎

Comment by jbash on Fertility Roundup #2 · 2023-10-17T14:18:03.953Z · LW · GW

The idea that this is a problem is so bizarre that I don't even know how to respond to it.

Comment by jbash on Linking Alt Accounts · 2023-10-10T02:37:20.624Z · LW · GW

I can now that I've noticed the little link icon.

https://www.lesswrong.com/posts/xjuffm5FFphFDvLWF/linking-alt-accounts?commentId=mj73tQhijz3yxYZe8

Comment by jbash on Linking Alt Accounts · 2023-10-09T23:56:51.079Z · LW · GW

Look upthread a few posts.

Comment by jbash on Linking Alt Accounts · 2023-10-08T00:31:11.164Z · LW · GW

Frankly, it's a bit difficult to believe such a moral standard would be obeyed in practice for anywhere close to 100% of the readerbase.

I'm not saying that I expect everybody to refrain from defamation as a matter of morality. It's just that that wouldn't be a very effective response in that particular case, and it's not the most obvious way that I would expect anybody to respond to that particular issue "in the heat of the moment".

It wouldn't be effective because if A posts that B and C are the same person, B coming back right away and saying that A is a squirrel molester is too obviously retaliatory, won't be believed, and is probably going to make A's original claim more credible.

Regardless of effectiveness, in my experience it seems as though most people who resort to smear campaigns do it because of a really fixed hatred for somebody. It's true that publishing the list could be the start of a long-term enmity, and that that could end with speading lies about a person, but usually that only happens after a long history of multiple different incidents.

Even so, I'm not saying that it couldn't happen... just that it seems strange to single it out among all the things that could happen. I would expect righteous-indignation types of responses much more often.

Maybe that would be differently if 2nd, 3rd, etc., 'alt' accounts were explicitly condoned in the site rules. But I'm pretty sure the mods are heavily against anyone making multiple accounts in secret.

Maybe I'm behind the times, but my understanding is that the norm on Internet forums, especially on non-corporate ones, is that multiple accounts are allowed unless explicitly forbidden. Not multiple abusive accounts, but most multiple accounts aren't abusive.

Also, if the core team on Less Wrong, specifically, categorically didn't want people to have multiple accounts, it would be very out of character for them not to write that down, regardless of what other sites do. That's just not how I've seen them to run things. They seem to be all about making sure people understand expectations.

I don't see anything about it in the FAQ, nor does it seem to appear in at least the first stage of the sign-up process. I do see a rule against using multiple accounts to evade bans. I'd be surprised to see that rule written the specific way it is if the intent were to forbid multiple accounts entirely. I also see rules against gaming the metrics in ways that would really be aided by multiple accounts... and yet those rules don't specifically mention multiple accounts.

Even if the mods were opposed, though, I think their best response to that sort of thing would be to take it up with the user, and ban either all but one of the accounts, or all of the accounts. And the right response for a non-moderator would be to report it to the mods and let them handle it. Especially because when people do have alternate names, it can often be for reasons that (a) you don't know about and (b) can involve risks of real harm.

The exception to that would be if there'd been been some kind of egregious activity that would clearly hurt community members if not exposed.

I can't see mass public disclosure fitting with the general ethos of this particular site. In fact I think this site is the sort of place where it fits least. It feels more in place on Hacker News. I don't know, but I wouldn't be surprised if they'd take it in stride on 4Chan. But on Less Wrong?

Comment by jbash on Linking Alt Accounts · 2023-10-07T19:47:43.895Z · LW · GW

It struck me as very weird and specific to use the word "defame". That word has a really specific meaning, and it's not actually how I'd expect anybody to react, no matter how angry they were. It wouldn't be a concern of mine.

It also sounded to me as though you thought that publishing a list of people's alts would be a perfectly fine thing to do.

That's because I thought that the point of jefftk's saying "they would be really mad at me" was to imply that they would have a good reason to be mad. And if you don't think it's actually acceptable to publish the list, then the question of whether people's anger about that would be "survivable" doesn't really arise.

So I read you as saying that posting the list would be OK, and that anybody who objected would be in the wrong. In fact, because of the word "defamation", I read you as saying that anybody who objected would be the sort of person who'd turn around and run a campaign of smearing lies. Which is a pretty harsh viewpoint and one that I definitely do not share.

Comment by jbash on Linking Alt Accounts · 2023-10-07T14:12:04.116Z · LW · GW

Reacting angrily to somebody doing something obnoxious like that is not "defamation".

I have zero influence here, but if I ran a site and a user did something like that, I would probably permaban them... and all their alts.

Comment by jbash on Biosecurity Culture, Computer Security Culture · 2023-09-03T03:36:47.718Z · LW · GW

Imagine that the Morris worm never happened, nor Blaster, nor Samy. A few people independently discovered SQL injection but kept it to themselves. [...]

That hypothetical world is almost impossible, because it's unstable. As soon as certain people noticed that they could get an advantage, or even a laugh, out of finding and exploiting bugs, they'd do it. They'd also start building on the art, and they'll even find ways to organize. And finding out that somebody had done it would inspire more people to do it.

You could probably have a world without the disclosure norm, but I don't see how you could have a world without the actual exploitation.

We have driverless cars, robosurgeons, and simple automated agents acting for us, all with the security of original Sendmail.

None of those things are exactly bulletproof as it is.

But having the whole world at the level you describe basically sounds like you've somehow managed to climb impossibly high up an incredibly rickety pile of junk, to the point where instead of getting bruised when you inevitably do fall, you're probably going to die.

Introducing the current norms into that would be painful, but not doing so would just let it get keeping worse, at least toward an asymptote.

and the level of caution I see in biorisk seems about right given these constraints.

If that's how you need to approach it, then shouldn't you shut down ALL biology research, and dismantle the infrastructure? Once you understand how something works, it's relatively easy to turn around and hack it, even if that's not how you originally got your understanding.

Of course there'd be defectors, but maybe only for relatively well understood and controlled purposes like military use, and the cost of entry could be pretty high. If you have generally available infrastructure, anybody can run amok.

Comment by jbash on AI #23: Fundamental Problems with RLHF · 2023-08-04T18:38:31.771Z · LW · GW

There is a real faction, building AI tools and models, that believes that human control over AIs is inherently bad, and that wants to prevent it. Your alignment plan has to overcome that.

That mischaracterizes it completely. What he wrote is not about human control. It's about which humans. Users, or providers?

He said he wanted a "sharp tool". He didn't say he wanted a tool that he couldn't control.

At another level, since users are often people and providers are almost always insitutions, you can see it as at least partly about whether humans or institutions should be controlling what happens in interactions with these models. Or maybe about whether many or only a few people and/or institutions should get a say.

An institution of significant size is basically a really stupid AI that's less well behaved than most of the people who make it up. It's not obvious that the results of some corporate decision process are what you want to have in control... especially not when they're filtered through "alignment technologies" that (1) frequently don't work at all and (2) tend to grossly distort the intent when they do sort of work.

That's for the current and upcoming generations of models, which are going to be under human or institutional control regardless, so the question doesn't really even arise... and anyway it really doesn't matter very much. Most of the stuff people are trying to "align" them against is really not all that bad.

Doom-level AGI is pretty different and arguably totally off topic. Still, there's an analogous question: how would you prefer to be permanently and inescapably ruled? You can expect to surveilled and controlled in excruciating detail, second by second. If you're into BCIs or uploading or whatever, you can extend that to your thoughts. If it's running according to human-created policies, it's not going to let you kill yourself, so you're in for the long haul.

Whatever human or institutional source the laws that rule you come from, they'll probably still be distorted by the "alignment technologies", since nobody has suggested a plausible path to a "do what I really want" module. If we do get non-distorting alignment technology, there may also be constraints on what it can and can't enforce. And, beyond any of that, even if it's perfectly aligned with some intent... there's no rule that says you have to like that intent.

So, would you like to be ruled according to a distorted version of a locked-in policy designed by some corporate committee? By the distorted day to day whims of such a committee? By the distorted day to day whims of some individual?

There are worse things than being paperclipped, which means that in the very long run, however irrelevant it may be to what Keven Fisher was actually talking about, human control over AIs is inherently bad, or at least that's the smart bet.

A random super-AI may very well kill you (but might also possibly just ignore you). It's not likely to be interested enough to really make you miserable in the process. A super-AI given a detailed policy is very likely to create a hellish dystopia, because neither humans nor their institutions are smart or necessarily even good enough to specify that policy. An AI directed day to day by institutions might or might not be slightly less hellish. An AI directed day to day by individual humans would veer wildly between not bad and absolute nightmare. Either of the latter two would probably be omnicidal sooner or later. With the first, you might only wish it had been omnicidal.

If you want to do better than that, you have to come up with both "alignment technology" that actually works, and policies for that technology to implement that don't create a living hell. Neither humans nor insitutions have shown much sign of being able to come up with either... so you're likely hosed, and in the long run you're likely hosed worse with human control.

Comment by jbash on Priorities for the UK Foundation Models Taskforce · 2023-08-04T03:11:20.010Z · LW · GW

Sorry, I just forgot to answer this until now. I think the issue is that the title doesn't make it clear how different the UK's proposal is from say the stuff that the "labs" negotiated with the US. "UK seems to be taking a hard line on foundation model training", or something?

Comment by jbash on Priorities for the UK Foundation Models Taskforce · 2023-07-22T03:17:47.398Z · LW · GW

This is actually way more interesting and impressive than most government or quasi-government output, and I suspect it'd draw a lot of interest with a title that called more attention to its substance.

Comment by jbash on News : Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI · 2023-07-22T00:53:27.383Z · LW · GW

I'm sorry, but I don't see anything in there that meaningfully reduces my chances of being paperclipped. Not even if they were followed universally.

I don't even see much that really reduces the chances of people (smart enough to act on them) getting bomb-making instructions almost as good as the ones freely available today, or of systems producing words or pictures that might hurt people emotionally (unless they get paperclipped first).

I do notice a lot of things that sound convenient for the commercial interests and business models of the people who were there to negotiate the list. And I notice that the list is pretty much a license to blast ahead on increasing capability, without any restrictions on how you get there. Including a provision that basically cheerleads for building anything at all that might be good for something.

There's really only one concrete action in there involving the models themselves. The White House calls it "testing", but OpenAI mutates it into "red-teaming", which narrows it quite a bit. Not that anybody has any idea how to test any of this using any approach. And testing is NOT how you create secure, correct, or not-everyone-killing software. The stuff under the heading of "Building Systems that Put Security First"... isn't. It's about building an arbitrarily dangerous system and trying to put walls around it.

Comment by jbash on Meta announces Llama 2; "open sources" it for commercial use · 2023-07-19T08:36:05.268Z · LW · GW

If it generates them totally at random, then no. They have no author. But even in that case, if you do it in a traditional way you will at least have personally made more decisions about what the output looks like than somebody who trains a model. The whole point of deep learning is that you don't make decisions about the weights themselves. There's no "I'll put a 4 here" step.

Comment by jbash on Meta announces Llama 2; "open sources" it for commercial use · 2023-07-18T21:24:45.944Z · LW · GW

I'm really confused about how anybody thinks they can "license" these models. They're obviously not works of authorship. Therefore they don't have copyrights. You can write a license, but anybody can still do anything they want with the model regardless of what you do or don't put into it.

Also, "open source" actually means something and that's not it. I don't actually like the OSD very much, but it's pretty thoroughly agreed upon.

Comment by jbash on Jailbreaking GPT-4's code interpreter · 2023-07-14T00:51:56.094Z · LW · GW

I'd interpret all of that as OpenAI

  1. recognizing that the user is going to get total control over the VM, and
  2. lying to the LLM in a token effort to discourage most users from using too many resources.

(1) is pretty much what I'd advise them to do anyway. You can't let somebody run arbitrary Python code and expect to constrain them very much. At the MOST you might hope to restrict them with a less-than-VM-strength container, and even that's fraught with potential for error and they would still have access to the ENTIRE container. You can't expect something like an LLM to meaningfully control what code gets run; even humans have huge problems doing that. Better to just bite the bullet and assume the user owns that VM.

(2) is the sort of thing that achieves its purpose even if it fails from time to time, and even total failure can probably be tolerated.

The hardware specs aren't exactly earth-shattering secrets; giving that away is a cost of offering the service. You can pretty easily guess an awful lot about how they'd set up both the hardware and the software, and it's essentially impossible to keep people from verifying stuff like that. Even then, you don't know that the VM actually has the hardware resources it claims to have. I suspect that if every VM on the physical host actually tried to use "its" 54GB, there'd be a lot of swapping going on behind the scenes.

I assume that the VM really can't talk to much if anything on the network, and that that is enforced from OUTSIDE.

I don't know, but I would guess that the whole VM has some kind of externally imposed maximimum lifetime independent of the 120 second limit on the Python processes. It would if I were setting it up.

The bit about retaining state between sessions is interesting, though. Hopefully it only applies to sessions of the same user, but even there it violates an assumption that things outside of the VM might be relying on.

Comment by jbash on A Friendly Face (Another Failure Story) · 2023-06-20T18:02:56.151Z · LW · GW

I assumed that humans would at least die off, if not be actively exterminated. Still need to know how and what happens after that. That's not 100 percent a joke.

What's a "CIS"?

Comment by jbash on A Friendly Face (Another Failure Story) · 2023-06-20T17:18:13.631Z · LW · GW

... but you never told me what its actual goal was, so I can't decide if this is a bad outcome or not...

Comment by jbash on UFO Betting: Put Up or Shut Up · 2023-06-13T17:39:14.525Z · LW · GW

On further edit: apparently I'm a blind idiot and didn't see the clearly stated "5 year time horizon" despite actively looking for it. Sorry. I'll leave this here as a monument to my obliviousness, unless you prefer to delete it.

Without some kind of time limit, a bet doesn't seem well formed, and without a reasonably short time limit, it seems impractical.

No matter how small the chance that the bet will have to be paid, it has to be possible for it to be paid, or it's not a bet. Some entity has to have the money and be obligated to pay it out. Arranging for a bet to be paid at any time after their death would cost more than your counterparty would get out of the deal. Trying to arrange a perpetual trust that could always pay is not only grossly impractical, but actually illegal in a lot of places. Even informally asking people to hold money is really unreliable very far out. And an amount of money that could be meaningful to future people could end up tied up forever anyway, which is weird. Even trying to be sure to have the necessary money until death could be an issue.

I'm not really motivated to play, but as an example I'm statistically likely to die in under 25 years barring some very major life extension progress. I'm old for this forum, but everybody has an expiration date, including you yourself. Locating your heirs to pay them could be hard.

Deciding the bet can get hard, too. A recognizable Less Wrong community as such probably will not last even 25 years. Nor will Metaculus or whatever else. A trustee is not going to have the same judgement as the person who originally took your bet.

That's all on top of the more "tractable" long-term risks that you can at least value in somehow... like collapse of whatever currency the bet is denominated in, AI-or-whatever completely remaking the economy and rendering money obsolete, the Rapture, etc, etc.

... but at the same time, it doesn't seem like there's any particular reason to expect definitive information to show up within any adequately short time.

On edit: I bet somebody's gonna suggest a block chain. Those don't necessarily have infinite lives, either, and the oracle that has to tell the chain to pay out could disappear at any time. And money is still tied up indefinitely, which is the real problem with perpetuities.