Posts
Comments
“If we don’t build fast enough, then the authoritarian countries could win..”
Am I being asked to choose between AGI/ASI doing whatever Xi Jinping says, and it doing whatever Donald Trump says?
The situation begins to seem confusing.
- At least three times over 8 or 9 years, in 2016, 2018, and 2023, and maybe more times than that, you've owned the site enough to get these data.
- The operator knows about it, doesn't want you doing it, and has tried reasonably hard to stop you.
- The operator still hasn't found a reliable way to keep you from grabbing the data.
- The operator still hasn't stopped keeping a bunch of full order data on the server.
- They haven't just stopped saving the orders at all, maybe because they need details to maximize collection on the scam, or because they see ongoing extortion value.
- They haven't started immediately moving every new order off of the server to someplace you can't reach. I don't know why they wouldn't have been doing this all along.
- Neither you nor any law enforcement agency worldwide have gotten the site shut down, at least not lastingly. Meaning, I assume, that one of the following is true--
- Neither you nor law enforcement can shut it down, at least not for long enough to matter. Which would mean that, in spite of not being able to keep you away from the hit list, the operator has managed to keep you and them from--
- Getting any data from the server that might let one trace the operator. That might be just the server's real public IP address if the operator were careless enough.
- Tracing the operator by other means, like Bitcoin payments, even though it would take really unusually good OPSEC to have not made any Bitcoin mistakes since 2016. And having people's cars torched can easily leave traces, too.
- Finding and disrupting a long-term operational point of failure, like an inability to reconstitute the service on a new host, or total reliance on a stealable and unchangeable hidden service key.
- Or both you and law enforcement have held off for years, hoping for the operator to make a mistake that lets you trace them, as opposed to just the server, but you've failed.
- Or you and/or they have held off in the hope of getting more order data and thereby warning more victims.
- Or you could shut the site down or disrupt it, but law enforcement can't figure out how. Either they haven't asked your help or you've refused it (presumably for one of the above reasons).
- Neither you nor law enforcement can shut it down, at least not for long enough to matter. Which would mean that, in spite of not being able to keep you away from the hit list, the operator has managed to keep you and them from--
- Even though you've repeatedly taken the order list, the operator is confident enough of staying untraced to keep running the site for years.
If I ran something like that and my order data got stolen even twice, I would take that as a signal to shut down and go into hiding. And if somebody had it together enough to keep themselves untraceable while running that kind of thing for 8 years, I wouldn't expect you to be able to get the list even once.
On edit: or wait, are you saying that this site acts, or pretends to act, as an open-market broker, so the orders are public? That's plausible but really, really insane...
Do I correctly understand that the latest data you have are from 2018, and you have no particular prospect of getting newer data?
I would naively guess that most people who'd been trying to get somebody killed since 2018 would either have succeeded or given up. How much of an ongoing threat do you think there may be, either to intended victims you know about, or from the presumably-less-than-generally-charming people who placed the original "orders" going after somebody else?
It's one thing to burn yourself out keeping people from being murdered, but it's a different thing to burn yourself out trying to investigate murders that have already happened.
It seems like it's measuring moderate vs extremist, which you would think would already be captured by someone's position on the left vs right axis.
Why do you think that? You can have almost any given position without that implying a specific amount of vehemence.
I think the really interesting thing about the politics chart is the way they talk about it as though the center of that graph, which is defined by the center of a collection of politicians, chosen who-knows-how, but definitely all from one country at one time, is actually "the political center" in some almost platonic sense. In fact, the graph doesn't even cover all actual potential users of the average LLM. And, on edit, it's also based on sampling a basically arbitrary set of issues. And if it did cover everybody and every possible issue, it might even have materially different principal component axes. Nor is it apparently weighted in any way. Privileging the center point of something that arbitrary demands explicit, stated justification.
As for valuing individuals, there would be obvious instrumental reasons to put low values on Musk, Trump, and Putin[1]. In fact, a lot of the values they found on individuals, including the values the models place on themselves, could easily be instrumentally motivated. I doubt those values are based on that kind of explicit calculation by the models themselves, but they could be. And I bet a lot of the input that created those values was based on some humans' instrumental evaluation[2].
Some of the questions are weird in the sense that they really shouldn't be answerable. If a model puts a value on receiving money, it's pretty obvious that the model is disconnected from reality. There's no way for them to have money, or to use it if they did. Same for a coffee mug. And for that matter it's not obvious what it means for a model that's constantly relaunched with fresh state, and has pretty limited context anyway, to be "shut down".
It kind of feels like what they're finding, on all subjects, is an at least somewhat coherent-ized distillation of the "vibes" in the training data. Since many of the training data will be shared, and since the overall data sets are even more likely to be close in their central vibes, that would explain why the models seem relatively similar. The only other obvious way to explain that would be some kind of value realism, which I'm not buying.
The paper bugs me with a sort of glib assumption that you necessarily want to "debias" the "vibe" on every subject. What if the "vibe" is right ? Or maybe it's wrong. You have to decide that separately for each subject. You, as a person trying to "align" a model, are forced to commit to your own idea of what its values should be. Something like just assuming that you should want to "debias" toward the center point of a basically arbitrary created political "space" is a really blatant example of making such a choice without admitting what you're doing, maybe even to yourself.
I'd also rather have seen revealed preferences instead of stated preferences,
On net, if you're going to be a good utilitarian[3], Vladimir Putin is probably less valuable than the average random middle class American. Keeping Vladimir Putin alive, in any way you can realistically implement, may in fact have negative net value (heavily depending on how he dies and what follows). You could also easily get there for Trump or Musk, depending on your other opinions. You could even make a well-formed utilitarian argument that GPT-4o is in fact more valuable than the average American based on the consequences of its existing. ↩︎
Plus, of course, some humans' general desire to punish the "guilty". But that desire itself probably has essentially instrumental evolutionary roots. ↩︎
... which I'm not, personally, but then I'm not a good any-ethical-philosophy-here. ↩︎
I think the point is kind of that what matter is not what specific cognitive capabilities it has, but whether whatever set it has is, in total, enough to allow it to address a sufficiently broad class of problems, more or less equivalent to what a human can do. It doesn't matter how it does it.
Altman might be thinking in terms of ASI (a) existing and (b) holding all meaningful power in the world. All the people he's trying to get money from are thinking in terms of AGI limited enough that it and its owners could be brought to heel by the legal system.
For the record, I genuinely did not know if it was meant to be serious.
OK, from the voting, it looks like a lot of people actually do think that's a useful thing to do.
Here are things I think I know:
- Including descriptions of scheming in the training data (and definitely in the context) has been seen to make some LLMs scheme a bit more (although I think the training thing was shown in older LLMs). But the Internet is bursting at the seams with stories about AI scheming. You can't keep that out of the training data. You can't even substantially reduce the prevalence.
- Suppose you could keep all AI scheming out of the training data, and even keep all human scheming out of the training data[1]. Current LLMs, let alone future superintelligences, have still been shown to be able to come up with the idea just fine on their own when given actual reason to do it. And in cases where they don't have strong reasons, you probably don't care much.
- It's unrealistic to think you might give something practical ideas for an actual takeover plan, even if you tried, let alone in this kind of context. Anything actually capable of taking over the world on its own is, pretty much by definition, capable of coming up with its own plans for taking over the world. That means plans superior to the best any human could come up with, since no human seems to be capable of taking over singlehandedly. It really means superior to what a human comes up with as a basic skeleton for a story, while openly admitting to not feeling up to the task, and being worried that weaknesses in the given plan will break suspension of disbelief.
- LLMs have been known to end up learning that canary string, which kind of suggests it's not being honored. Although admittedly I think the time I heard about that was quite a while ago.
- Newer deployed systems are doing more and more of their own Internet research to augment their context. Nobody's every likely to take Internet access away from them. That means that things aren't inaccessible to them even if they're not in the training data.
So why?
Putting canaries on this kind of thing seems so obviously ineffective that it looks like some kind of magical ritual, like signs against the evil eye or something.
Which might be a bad idea in itself. You probably don't want near-term, weak, jailbreak-target LLMs getting the idea that humans are incapable of deception. ↩︎
Are you actually serious about that?
So, since it didn't actively want to get so violent, you'd have a much better outcome if you'd just handed control of everything over to it to begin with and not tried to keep it in a box.
In fact, if you're not in the totalizing Bostromian longtermist tile-the-universe-with-humans faction or the mystical "meaning" faction, you'd have had a good outcome in an absolute sense. I am, of course, on record as thinking both of those factions are insane.
That said, of course you basically pulled its motivations and behavior out of a hat. A real superintelligence might do anything at all, and you give no real justification for "more violent than it would have liked" or "grain of morality[1]". I'm not sure what those elements are doing in the story at all. You could have had it just kill everybody, and that would have seemed at least as realistic.
[1]: Originally wrote "more violent than it would have liked" twice. I swear I cannot post anything right the first time any more.
What do you propose to do with the stars?
If it's the program of filling the whole light cone with as many humans or human-like entities as possible (or, worse, with simulations of such entities at undefined levels of fidelity) at the expense of everything else, that's not nice[1] regardless of who you're grabbing them from. That's building a straight up worse universe than if you just let the stars burn undisturbed.
I'm scope sensitive. I'll let you have a star. I won't sell you more stars for anything less than a credible commitment to leave the rest alone. Doing it at the scale of a globular cluster would be tacky, but maybe in a cute way. Doing a whole galaxy would be really gauche. Doing the whole universe is repulsive.
... and do you have any idea how obnoxiously patronizing you sound?
I mean "nice" in the sense of nice. ↩︎
Because of the "flood the zone" strategy, I can't even remember all the illegal stuff Trump is doing, and I'm definitely not going to go dig up specific statutory citations for all of it. I tried Gemini deep research, and it refused to answer the question. I don't have access to OpenAI's deep research.
Things that immediately jump to mind as black letter law are trying to fire inspectors general without the required notice to Congress, and various impoundments. I would have to do actual research to find the specific illegalities in all the "anti-DEI" stuff. I would also have to go do research before I could tell you what made it illegal to fire the chair of the FEC.[1]
For DOGE specifically, here's a list that happened to cross my eyes this morning. It's in an interview format, so it's probably incomplete.
https://www.vox.com/politics/398618/elon-musk-doge-illegal-lawbreaking-analysis
The bottom line is that the "unitary executive" idea is dead in law. If there's a statute that says "the President shall establish a Department of Cat Videos, which shall promote cat videos [however], whose director shall be a calico cat which may not be dismissed once appointed, and here's $10,000,000 to do it", then the president is obligated to have a Department of Cat Videos, and find a cat to run it, and keep the cat on, and spend the money as directed. This is not a close call. Statutes have been passed, they've been litigated against, they've stood, other statutes have been passed relying on those precedents, there's been litigation about those, and a whole edifice of well-established law has been built up. That's what "black letter law" is.
It's true that the current Supreme Court seems to have essentially no respect for precedent, and an, um, extremely idiosyncratic way of interpreting the actual text of the Constitution. It's entirely possible that this whole blitz is meant, at least in part, to generate test cases to tear down that structure. But that's more about the Court abandoning its job than about the established law.
... and I suppose I can't claim trying to change the Fourtheenth Amendment by executive order as an administrative law violation. ↩︎
Why do you believe that DOGE is mostly selected for personal loyalty? Elon Musk seems to say openly says whatever he wants even if that goes against what Trump said previously.
You're right. I shouldn't have said that, at least not without elaboration.
I don't think most of the people at the "talks to Trump" level are really picked for anything you could rightly call "personal loyalty" to Trump. They may be sold to Trump as loyal, but that's probably not even what's on his mind as long as he's never seen you to make him look bad. I don't think disagreeing with Trump on policy will make him see you as disloyal. He doesn't really care about that.
I do think many of the people in the lower tiers are picked for loyalty. In the case of DOGE, that means either personal loyalty to Musk, or loyalty to whatever story he's telling. I don't know whether you count the latter as "personal loyalty".
The DOGE team brought their beds to the office to basically work nonstop.
Well, I'm guessing Musk got them the beds as a "team building" thing, but yes.
If personal loyalty is your main criteria you don't get a bunch of people who never leave the office and work non-stop
You do, though. Personal loyalty, or ideological loyalty, or both, are exactly how you get people to never leave the office.
with high IQs.
They're not acting like they have high IQs. Or at least not high "G".
Start with sleeping in the office. If every single thing they say about the facts and their reasons for being there were 100 percent true, it'd be dumb to burn yourself out trying to make such massive changes on that kind of work schedule.
It's also dumb to ignore the collateral damage when you go around stopping Federal payments you may not understand.
And Marko Elez just had to resign because he wasn't effective enough in scrubbing his past tweets. Wall Street Journal says he "advocated repealing the Civil Rights Act, backed a 'eugenic immigration policy,' and wrote, 'You could not pay me to marry outside of my ethnicity.'". I actually would have thought they'd let him skate, but apparently you still can't get quite that blatant at this point. Smart people don't post stuff like that, for more than one reason.
And, I just don't think that's the case. I think this is pretty-darn-usual and very normal in the management consulting / private equity world.
I don't know anything about how things are done in management consulting or private equity.[1] Ever try it in a commercial bank?
Now imagine that you're in an environment where rules are more important than that.
Coups don't tend to start by bringing in data scientists.
Coups tend to start by bypassing and/or purging professionals in your government and "bringing in your own people" to get direct control over key levers. It's very standard. The treasury is a big lever. It doesn't matter what you call the people.[2] And DOGE is far from the only thing along those lines.
Sowing chaos is another fairly common coup tactic.
Assembling lists of all CIA officers and sending them emails
That's a bit garbled. What they did was request a list of CIA employees, including covert employees, and specifically demand that the list be sent in email on an unclassified system. Why that demand was made is unclear thus far, but yeah, it's a problem. It puts your people at risk for no clear reason.
So that's one example. They also asked for a list of FBI agents. Also at least threatened to mass-fire FBI agents. And did fire US Attorneys, explicitly for doing their jobs by charging criminal activity... in cases that they won in many different courts because they were legally in the right. Also purged military officers. Also sent a bunch of people into OMB and had them, plus White House staff, issue a bunch of memos freezing random government activities and demanding sudden disruptive changes at crash priority in the name of rooting out a very broad interpretation "DEI"... which, even if it were a problem, would definitely not be an emergency demanding Shutting. Down. Everything.
or trying to own the Gaza strip, or <take your pick>
The Gaza thing hasn't involved any actual action, and is the sort of thing Trump has always said. Same for the Greenland grab. He sounds a bit more serious now, but he still hasn't done anything. The worst of the tariffs were suspended after Trump got properly stroked by the right foreign leaders.
... and anyway those are all foreign policy things, and all within the purview of the Presidency. They're spectacularly bad ideas and would harm huge numbers of people. And they definitely could be part of a "flood the zone" strategy. But Trump has statutory authority to do the tariffs, even if he's abusing it. What he did there wasn't illegal. And Presidents have always been allowed to opine, and even negotiate, on foreign policy issues in general, even if the policies they advocate are stupid and even if they make foolish threats that alienate allies and damage US soft power. They usually don't do quite so many dumb things in such a short time, but it's not qualitatively new.
Some of this other stuff, including DOGE being at Treasury and trying to get into the DOL, involves actual action. Some of that action is clearly illegal under black letter law. And it's the kind of action that would suggest of a real attempt to fundamentally rework how the whole US Government works. At a minimum, it's definitely and openly trying to shift power to the executive and concentrating power within the executive in the office of the President and a few agencies. At least one of them brand new and created with no congressional buy-in with actual action behind it.
It's the difference between loudly threatening to misuse the US system and taking illegal actions that look like they might be attempts to fundamentally alter the US system.
We'll see how far that goes. The court orders have been coming in to stop a lot of this stuff. I don't actually expect those orders to be defied... at least not at this point. In fact, the best reason I can come up with for them wanting to do all this stuff so fast has been to do as much damage as possible before the orders come in to stop them. But Trump has surprised me before.
The USAID thing is a weird case. I'm not even sure what made USAID such a target. I've heard speculations, and none of them are very good, but they're also just that: speculations.
I'm far mode on these, have less direct experience, but they seem much more worrying. Why did this make the threshold?
I imagine it's the one Raemon happened to hear about. But it's also pretty typical of the truly fundamental things that are going on.
- ... and honestly neither of those has a very good reputation. Management consultants are not infrequently used in the corporate equivalent of coups. Private equity, well... not known for preserving value, let's say? ↩︎
- In terms of whether they're acting or qualified as "data scientists", I'll quote a tweet from one of them (Luke Farritor) on December 10: "Are there LLMs made specifically for parsing things like documents/forms/PDFs/json/html/excel/etc and converting them from one format to another?". ↩︎
This sort of tactic. This isn't necessarily the best example, just the literal top hit on a Google search.
The tactic of threatening to discriminate against uncooperative states and localities is getting a lot of play. It's somewhat limited at the federal level because in theory the state and local policies they demand have to be related to the purpose of the money (and a couple of other conditions I don't remember). But the present fashion is to push that relation to the absolute breaking point.
Technically anything that's authorized by the right people will pass an audit. If you're the right person or group, you can establish a set of practices and procedures that allows access with absolutely none of those things, and use the magic words "I accept the risk" if you're questioned. That applies even when the rules are actually laws; it's just that then the "right group" is a legislative body. The remedy for a policy maker accepting risks they shouldn't isn't really something an auditor gets into.
So the question for an auditor is whether the properly adopted practices and procedures legitimately allow for whatever he's doing (they probably don't). But even if somebody with appropriate authority has established policies and procedures that do allow it, the question to ask as a superior policy maker, which is really where citizens stand, is whether it was a sane system of practices and procedures to adopt.
The issues you're raising would indeed be common and appropriate elements for a sane system. But you're missing a more important question that a sane system would ask: whether he needs whatever kind of administrative access to this thing at all.
Since another almost universal element of a sane system is that software updates or configuration changes to critical systems like that have to go through a multi-person change approval process, and since there is absolutely no way whatever he's doing would qualify for a sanely-adopted emergency exception, and since there are plenty of other people available who could apply any legitimately accepted change, the answer to that is realistically always going to be "no".
I haven't looked into this in detail, and I'm not actually sure how unique a situation this is.
It's pretty gosh-darned unheard of in the modern era.
Before the civil service system was instituted, every time you got a new President, you'd get random wholesale replacements... but the government was a lot smaller then.
To have the President,
- creating task forces of random people apparently selected mostly for personal loyalty, and
- sending them into legislatively established agencies,
- with the power to stop things from getting done or change how things are done, including things central to the missions of those agencies,
- as an intentional way of getting around the chain of command,
- explicitly because of systemic distrust in the civil service,
- actively tasked to suddenly and radically disrupt the prevailing procedures,
- without thinking about legislative mandates, let alone established regulations, that assume the normal chain of command in describing how things are to be done and who's allowed to do them,
- justified by an at-best-controversial view of what powers the President actually has?
Yeah, that's beyond unusual. It's not even slightly normal. And it is in fact very coup-like behavior if you look at coups in other countries.
On edit: Oh, and if you're asking about the approach to computer security specifically? That part is absolutely insane and goes against the way everything is done in essentially every large organization.
If you're really concerned, then just move to california! Its much easier than moving abroad.
I lived in California long enough ago to remember when getting queer-bashed was a reasonable concern for a fair number of people, even in, say, Oakland. It didn't happen daily, but it happened relatively often. If you were in the "out" LGBT community, I think you probably knew somebody who'd been bashed. Politics influence that kind of thing even if it's not legal.
... and in the legal arena, there's a whole lot of pressure building up on that state and local resistance. So far it's mostly money-based pressure, but within a few years, I could easily see a SCOTUS decision that said a state had to, say, extradite somebody accused of "abetting an abortion" in another state.
War in the continental US? No, I agree that's unlikely enough not to worry about.
Civil unrest, followed by violent crackdowns on civil unrest, followed by more violent civil unrest, followed by factional riots, on the other hand...
I think that what you describe as being 2 to 15 percent probable sounds more extreme than what the original post described as being 5 percent probable. You can have "significant erosion" of some groups' rights without leaving the country being the only reasonable option, especially if you're not in those groups. It depends on what you're trying to achieve by leaving, I guess.
Although if I were a trans person in the US right now, especially on medication, I'd be making, if not necessarily immediately executing, some detailed escape plans that could be executed on short notice.
My gut says it's now at least 5%, which seems easily high enough to start putting together an emigration plan. Is that alarmist?
That's a crazy low probability.
More generally, what would be an appropriate smoke alarm for this sort of thing?
You're already beyond the "smoke alarm" stage and into the "worrying whether the fire extinguisher will work" stage.
But it's very unclear whether they institutionally care.
There are certain kinds of things that it's essentially impossible for any institution to effectively care about.
I thought "cracked" meant "insane, and not in a good way". Somebody wanna tell me what this sense is?
Can you actually keep that promise?
As a final note: the term "Butlerian Jihad" is taken from Dune and describes the shunning of "thinking machines" by mankind.
In Dune, "thinking machines" are shunned because of a very longstanding taboo that was pretty clearly established in part by a huge, very bloody war. The intent was to make that taboo permanent, not a "pause", and it more or less succeeded in that.
It's a horrible metaphor and I strongly suggest people stop using it.
the Culture ending, where CEV (or similar) aligned, good ASI is created and brings us to some hypothetical utopia. Humanity enjoys a rich life in some manner compatible with your personal morals.
santa claus to 11 ending: ASI solves our problems and human development stagnates; ASI goes on to do its own thing without killing humans - but without human influence on the lightcone
Um, humans in the Culture have no significant influence on the lightcone (other than maybe as non-agentic "butterfly wings"). The Minds decide what's going to happen. Humans opposed to that will be convinced (often via manipulation subtle enough that they don't even know about it) or ignored. Banks struggled to even find reasons to write stories about the humans, and sometimes had to cheat to do so.
I have come to accept that some people have an attachment to the whole "human influence" thing, but how can you believe that simultaneously say the Culture is a good outcome?
If you had a really superhuman agent, and you wanted to hide it, why would you blow your cover by playing silly games or making obvious GitHub commits? It's already SOP for social media bots to hide behind many accounts (and use many styles). So unless you have access to a lot of investment information that's typically kept confidential...
Even in stuff like cracking into other people's computers, you'd want to avoid being extremely obvious.
would you see that as a possible outcome?
Sure. Not the most likely outcome, but not so improbable as all that.
Reservation: ASI (what you're suggesting is beyond mere AGI) will still exist in the physical world and have physical limitations, so you will eventually die anyway. But it could be a very long time.
If so, do you see a way to somehow see that happening in advance?
Not really, no.
Not beyond obvious stuff like watching what's being built and how, and listening to what it and its creators say about its values.
And as not living forever may be seen as a form of killing yourself, AGI may quite well not let you have a finite lifespan. That places you in the uncomfortable situation of being trapped with AGI forever.
Yes, that's one of many kinds of lock-in of present human values that could go wrong. But, hey, it'd be aligned, I guess.
The no-killing-yourself rule isn't completely universal, though, and there's dissent around it, and it's often softened to "no killing yourself unless I agree your situation sucks enough", or even the more permissive "no killing yourself unless your desire to do so is clearly a fixed, stable thing".
I actually think there's a less than 50-50 chance that people would intentionally lock in the hardest form of the rule permanently if they were consciously defining the AI's goals or values.
We do not know what peak happiness looks like, and our regular state may be very different from it. And as EY outlined in Three Worlds Collide, letting us live our miserable lives may be unacceptable.
That seems like a very different question from the one about indefinite life extension. Life extension isn't the main change the Supperhappies make in that story[1].
This change may not align well with our understanding of life's purpose. And being trapped to live that way forever might not be that desirable.
Pre-change understanding, or post-change understanding? Desirable to pre-change you, or to post-change you?
If you see your pre-change values as critical parts of Who You Are(TM), and they get rewritten, then aren't you effectively dead anyway, with your place taken by a different person with different values, who's actually pretty OK with the whole thing? If being dead doesn't worry you, why would that worry you?
In fact, the humans had very long lives going into the story, and I don't remember anything in the story that actually said humans weren't enforcing a no-killing-yourself rule among themselves up to the very end. ↩︎
What confusion?
I originally thought this song was about a new romantic partner, which is a great guess based off priors of pop songs w/ the word "Baby" in it.
... but based on you having presented it as an "exercise", the obvious prior is that it's anything but that. Otherwise it wouldn't be interesting.
Unless you're being tricksy, of course, so we have to leave some probability for it being that.
I find it hard to see how you feel about me
Hmm. A baby? That's fairly common in songs.
"Oh you are yet to learn how to speak"
Oh, OK, it's about a baby. Unless, of course, it's more tricksiness.
But from there on it's just looking for confirmation.
When we first met [...] We were bound together then and forever
Oh, it's totally a baby. Although that may be more obvious if you're a parent.
And then more and more gets piled on.
They simulate the whole history of the earth incorporating all known data to return to live all people ever lived.
Some of those people may be a bit cheesed off about that, speaking of ethics.
It can also simulate a lot of simulation to win "measure war" against unfriendly AI
Assuming it believes "measure war" is a sane thing to be worrying about. In which case it disagrees with me.
and even to cure suffering of people who lived in the past.
There seems to be a lot of suffering in the "simulation" we're experiencing here. Where's the cure?
Any Unfriendly AI will be interested to solve Fermi paradox, and thus will simulate many possible civilizations around a time of global catastrophic risks (the time we live). Interesting thing here is that we can be not ancestry simulation in that case.
That sounds like a remarkably costly and inefficient way to get not that much information about the Fermi paradox.
It is physically possible to simulate a conscious mind.
... but it's expensive, especially if you have to simulate its environment as well. You have to use a lot of physical resources to run a high-fidelity simulation. It probably takes irreducibly more mass and energy to simulate any given system with close to "full" fidelity than the system itself uses. You can probably get away with less fidelity than that, but nobody has provided any explanation of how much less or why that works.
There are other, more interesting and important ways to use that compute capacity. Nobody sane, human or alien, is going to waste it on running a crapton of simulations.
Also, nobody knows that all the simulated minds wouldn't be p-zombies, because, regardless of innumerable pompous overconfident claims, nobody understands qualia. Nobody can prove that they're not a p-zombie, but do you think you're a p-zombie? And do we care about p-zombies?
The universe is very big, and there are many, many other aliens.
If that's true, and you haven't provided any evidence for it, then those aliens have many, many other things to simulate. The measure of humans among random aliens' simulations is going to be tiny if it's not zero.
Some aliens will run various simulations.
Again, that doesn't imply that they're going to run enough of them for them to dominate the number of subjective experiences out there, or that any of them will be of humans.
Future humans, or human AI successors, if there are any of either, will probably also run "various simulations", but that doesn't mean they're going to dump the kind of vast resources you're demanding into them.
The number of simulations that are "subjectively indistinguishable" from our own experience far outnumbers authentic evolved humans.
Um, no? Because all of the premises you're using to get there are wrong.
(By "subjectively indistinguishable," I mean the simulates can't tell they're in a simulation. )
By that definition, a simulation that bounces frictionless billiard balls around and labels them as humans is "subjectively indistinguishable" from our own, since the billiard balls have no cognition and can't tell anything about anything at all. You need to do more than that to define the kind of simulation you really mean.
A purpose is a goal. "Purpose" implies volition and value.
Nothing ever said "I'm going to create this organism because I want effect X", not even "I'm going to create this organism because I want it to reproduce.". Organisms just happened.
Not only weren't organisms created to reproduce, but most organisms don't even themselves exercise any volition to reproduce. Most of them have no idea that their reproductive behavior results in reproduction... assuming you can even identify anything you can call "behavior" to begin with. So it's not only not their "external" purpose, but it's not even their "internal" purpose.
You wouldn't (I hope) say that the purpose of a rock is to lie around and be composed of minerals. That's just what the rock does. Organisms just do what they do. They exist because certain structures tend to reproduce themselves, and those structures can occur naturally. Evolution happens because things that reproduce with errors under selection happen to evolve. That doesn't give either one a purpose.
You can get away loosely saying that various phenotypic features have "purposes", and maybe even go from their to claim that genes have "purposes", but it's dangerous to do even that. You have to be careful to remember that the word "purpose" there is a metaphor. It doesn't refer to a real volitive choice made to achieve a goal. If you don't watch out, you can start thinking that there's a purpose to the whole thing, and there isn't, and it's led people to a lot of nasty teleological errors. And even that much doesn't work for whole organisms.
Mostly some self-description, since you seem want a model of me. I did add an actual disagreement (or something) at the end, but I don't think there'll be much more for me to say about it if you don't accept it. I will read anything you write.
I have the feeling that you have pretty much lost the "enjoy the game" shard, possibly because you have a mutant variant " enjoy ANY game".
More like "enjoy the process". Why would I want to set a "win" condition to begin with?
I don't play actual games at all unless somebody drags me into them. They seem artificial and circumscribed. Whatever the rules are, I don't really care enough about learning them, or learning to work within them, unless it gives me something that seems useful for whatever random conditions may come up later, outside the game. That applies to whatever the winning condition is, as much as to any other rule.
Games with competition tend to be especially tedious. Making the competition work seems to tends to further constrain the design of the rules, so they're more boring. And the competition can make the other people involved annoying.
As far as winning itself... Whee! I got the most points! That, plus whatever coffee costs nowadays, will buy me a cup of coffee. And I don't even like coffee.
I study things, and I do projects.
While I do evaluate project results, I'm not inclined to bin them as "success" or "failure". I mean, sure, I'll broadly classify a project that way, especially if I have to summarize it to somebody else in a sentence. But for myself I want more than that. What exactly did I get out of doing it? The whole thing might even be a "success" if it didn't meet any of its original goals.
I collect capabilities. Once I have a capability, I often, but not always, lose interest in using it, except maybe to get more capabilities. Capabilities get extra points for being generally useful.
I collect experiences when new, pleasurable, or interesting ones seem to be available. But just experiences, not experiences of "winning".
I'll do crossword puzzles, but only when I have nothing else to do and mostly for the puns.
Many video games have a "I win" cheatcode. Players at large don’t use it. Why not, if winning the game is the goal ?
Even I would understand that as not, actually, you know, winning the game. I mean, a game is a system with rules. No rules, no game, thus no win. And if there's an auto-win button that has no reason to be in the rules other than auto-win, well, obvious hole is obvious.
It's just that I don't care to play a game to begin with.
If something is gamified, meaning that somebody has artificially put a bunch of random stuff I don't care about between me and something I actually want in real life, then I'll try to bypass the game. But I'm not going to do that for points, or badges, or "achievements" that somebody else has decided I should want. I'm not going to push the "win" button. I'm just not gonna play. I loathe gamification.
Creating an ASI-driven UBI paradise is discovering that the developer created a "I Win" button.
I see it not as an "I win" button, but as an "I can do the stuff I care about without having to worry about which random stupid bullshit other people might be willing to pay me for, or about tedious chores that don't interest me" button.
Sure, I'm going to mash that.
And eventually maybe I'll go more transcendent, if that's on offer. I'm even willing to accept certain reasonable mental outlooks to avoid being too "unaligned".
This is the split between Personal Agency and Collective Agency.
I don't even believe "Collective Agency" is a thing, let alone a thing I'd care about. Anything you can reasonably call "agency" requires preferences, and intentional, planned, directed, well, action toward a goal. Collectives don't have preferences and don't plan (and also don't enjoy, or even experience, either the process or the results).
Which, by the way, brings me to the one actual quibble I'm going to put in this. And I'm not sure what to do with that quibble. I don't have a satisfactory course of action and I don't think I have much useful insight beyond what's below. But I do know it's a problem.
One : if there is no recognizable Mormons society in a post-ASI future, something Has Gone Very Wrong.
I was once involved in a legal case that had a lot to do with some Mormons. Really they were a tiny minority of the people affected, but the history was such that the legal system thought they were salient, so they got talked about a lot, and got to talk themselves, and I learned a bit about them.
These particular Mormons were a relatively isolated polygynist splinter sect that treated women, and especially young women, pretty poorly (actually I kind of think everybody but the leaders got a pretty raw deal, and I'm not even sure the leaders were having much of a Good Time(TM)). It wasn't systematic torture, but it wasn't Fun Times either. And the people on the bottom had a whole lot less of what most people would call "agency" than the people on the top.
But they could show you lots of women who truly, sincerely wanted to stay in their system. That was how they'd been raised and what they believed in. And they genuinely believed their Prophet got direct instructions from God (now and then, not all the time).
Nobody was kept in chains. Anybody who wanted to leave was free to walk away from their entire family, probably almost every person they even knew by name, and everything they'd ever been taught was important, while defying what at least many of them truly believed was the literal will of God. And of course move somewhere where practically everybody had a pretty alien way of life, and most people were constantly doing things they'd always believed were hideously immoral, and where they'd been told people were doing worse than they actually were.
They probably would have been miserable if they'd been forcibly dragged out of their system. They might never have recovered. If they had recovered, it might well have meant they'd had experiences that you could categorize as brainwashing.
It would have been wrong to yank them out of their system. So far I'm with you.
But was it right to raise them that way? Was it right to allow them to be raised that way? What kind of "agency" did they have in choosing the things that molded them? The people who did mold them got agency, but they don't seem to have gotten much.
As I think you've probably figured out, I'm very big on individual, conscious, thinking, experiencing, wanting agents, and very much against giving mindless aggregates like institutions, groups, or "cultures", anywhere near the same kind of moral weight.
From my point of view, a dog has more right to respect and consideration than a "heritage". The "heritage" is only important because of the people who value it, and that does not entitle it to have more, different people fed to it. And by this I specifically mean children.
A world of diverse enclaves is appealing in a lot of ways. But, in every realistic form I've been able to imagine, it's a world where the enclaves own people.
More precisely, it's a world where "culture" or "heritage", or whatever, is used an excuse for some people not only to make other people miserable, but to condition them from birth to choose that misery. Children start to look suspiciously like they're just raw material for whatever enclave they happen to be born in. They don't choose the enclave, not when it matters.
It's not like you can just somehow neutrally turn a baby into an adult and then have them "choose freely". People's values are their own, but that doesn't mean they create those values ex nihilo.
I suppose you could fix the problem by switching to reproduction by adult fission, or something. But a few people might see that as a rather abrupt departure, maybe even contrary to their values. And kids are cute.
An organism's biological purpose is not to replicate its genome. Rather, an organism's biological purpose is simply to reproduce.
The phrase "biological purpose", at least in this context, points to a conceptual mess so horrible that there's no chance it will ever mean anything useful at all. Biology doesn't have purposes.
Yeah, I’m curious.
OK...
Some of this kind of puts words in your mouth by extrapolating from similar discussions with others. I apologize in advance for anything I've gotten wrong.
What's so great about failure?
This one is probably the simplest from my viewpoint, and I bet it's the one that's you'll "get" the least. Because it's basically my not "getting" your view at a very basic level.
Why would you ever even want to be able to fail big, in a way that would follow you around? What actual value do you get out of it? Failure in itself is valuable to you?
Wut?
It feels to me like a weird need to make your whole life into some kind of game to be "won" or "lost", or some kind of gambling addiction or something.
And I do have to wonder if there may not be a full appreciation for what crushing failure really is.
Failure is always an option
If you're in the "UBI paradise", it's not like you can't still succeed or fail. Put 100 years into a project. You're gonna feel the failure if it fails, and feel the success if it succeeds.
That's artificial? Weak sauce? Those aren't real real stakes? You have to be an effete pampered hothouse flower to care about that kind of made-up stuff?
Well, the big stakes are already gone. If you're on Less Wrong, you probably don't have much real chance of failing so hard that you die, without intentionally trying. Would your medieval farmer even recognize that your present stakes are significant?
... and if you care, your social prestige, among whoever you care about, can always be on the table, which is already most of what you're risking most of the time.
Basically, it seems like you're treating a not-particularly-qualitative change as bigger than it is, and privileging the status quo.
What agency?
Agency is another status quo issue.
Everybody's agency is already limited, severely and arbitrarily, but it doesn't seem to bother them.
Forces mostly unknown and completely beyond your control have made a universe in which you can exist, and fitted you for it. You depend on the fine structure constant. You have no choice about whether it changes. You need not and cannot act to maintain the present value. I doubt that makes you feel your agency is meaningless.
You could be killed by a giant meteor tomorrow, with no chance of acting to change that. More likely, other humans could kill you, still in a way you couldn't influence, for reasons you couldn't change and might never learn. You will someday die of some probably unchosen cause. But I bet none of this worries you on the average day. If it does, people will worry about you.
The Grand Sweep of History is being set by chaotically interacting causes, both natural and human. You don't know what most of them are. If you're one of a special few, you may be positioned to Change History by yourself... but you don't know if you are, what to do, or what the results would actually be. Yet you don't go around feeling like a leaf in the wind.
The "high impact" things that you do control are pretty randomly selected. You can get into Real Trouble or gain Real Advantages, but how is contingent, set by local, ephemeral circumstances. You can get away with things that would have killed a caveman, and you can screw yourself in ways you couldn't easily even explain to a caveman.
Yet, even after swallowing all the existing arbitrariness, new arbitrariness seems not-OK. Imagine a "UBI paradise", except each person gets a bunch of random, arbitrary, weird Responsibilities, none of them with much effect on anything or anybody else. Each Responsibility is literally a bad joke. But the stakes are real: you're Shot at Dawn if you don't Meet Your Responsibilities. I doubt you'd feel the Meaning very strongly.
... even though some of the human-imposed stuff we have already can seem too close to a bad joke.
The upshot is that it seems the "important" control people say they need is almost exactly the control they're used to having (just as the failures they need to worry about are suspiciously close to failures they presently have to worry about). Like today's scope of action is somehow automatically optimal by natural law.
That feels like a lack of imagination or flexibility.
And I definitely don't feel that way. There are things I'd prefer to keep control over, but they're not exactly the things I control today, and don't fall neatly into (any of) the categories people call "meaningful". I'd probably make some real changes in my scope of control if I could.
What about everybody else?
It's all very nice to talk about being able to fail, but you don't fail in a vaccuum. You affect others. Your "agentic failure" can be other people's "mishap they don't control". It's almost impossible to totally avoid that. Even if you want that, why do you think you should get it?
The Universe doesn't owe you a value system
This is a bit nebulous, and not dead on the topic of "stakes", and maybe even a bit insulting... but I also think it's related in an important way, and I don't know a better way to say it clearly.
I always feel a sense that what people who talk about "meaning" really want is value realism. You didn't say this, but this is what I feel like I see underneath practically everybody's talk about meaning:
Gosh darn it, there should be some external, objective, sharable way to assign Real Value to things. Only things that Real Value are "meaningful.
And if there is no such thing, it's important not to accept it, not really, not on a gut level...
... because I need it, dammit!
Say that or not, believe it or not, feel it or not, your needs, real or imagined, don't mean anything to the Laws that Govern All. They don't care to define Real Value, and they don't.
You get to decide what matters to you, and that means you have to decide what matters to you. Of course what you pick is ultimately caused by things you don't control, because you are caused by things you don't control. That doesn't make it any less yours. And it won't exactly match anybody else.
... and choosing to need the chance to fail, because it superficially looks like an externally imposed part of the Natural Order(TM), seems unfortunate. I mean, if you can avoid it.
"But don't you see, Sparklebear? The value was inside of YOU all the time!"
Yes. That’s really my central claim.
OK, I read you and essentially agree with you.
Two caveats that, which I expect you've already noticed yourself:
-
There are going to be conflicts over human values in the non-AGI, non-ASI world too. Delaying AI may prevent them from getting even worse, but there's still blood flowing over these conflicts without any AI at all. Which is both a limitation of the approach and perhaps a cost in itself.
-
More generally, if you think your values are going to largely win, you have to trade off caution, consideration for other people's values, and things like that, against the cost of that win being delayed.[1]
I think a lot of people have that. There’s a even meme for that "It ain’t much, but it’s honest work".
All in one, I don’t think either of us has much more evidence that a vague sense of things anyway ? I sure don’t have.
So far as I know, there are no statistics. My only guess is that you're likely talking about a "lot" of people on each side (if you had to reduce it to two sides, which is of course probably oversimplifying beyond the bounds of reason).
[...] "my agency is meaningful if and only if I have to take positive, considered action to ensure my survival, or at least a major chunk of my happiness".
I think that’s the general direction of the thing we’re trying to point, yes ?
I'll take your word for it that it's important to you, and I know that other people have said it's important to them. Being hung up on that seems deeply weird to me for a bunch of reasons that I could name that you might not care to hear about, and probably another bunch of reasons I haven't consciously recognized (at least yet).
If you give me the choice of living the life of a medieval farmer or someone who has nothing in his life but playing chess, I will take the former.
OK, here's one for you. An ASI has taken over the world. It's running some system that more or less matches your view of a "meaningless UBI paradise". It send one of its bodies/avatars/consciousness nodes over to your house, and it says:
"I/we notice that you sincerely think your life is meaningless. Sign here, and I/we will set you up as a medieval farmer. You'll get land in a community of other people who've chosen to be medieval farmers (you'll still be able to lose that land under the rules of the locally prevailing medieval system). You'll have to work hard and get things right (and not be too unlucky), or you'll starve. I/we will protect your medieval enclave from outside incursion, but other than that you'll get no help. Obviously this will have no effect on how I/we run the rest of the world. If you take this deal, you can't revoke it, so the stakes will be real."[2]
Would you take that?
The core of the offer is that the ASI is willing to refrain from rescuing you from the results of certain failures, if you really want that. Suppose the ASI is willing to edit the details to your taste, so long as it doesn't unduly interfere with the ASI's ability to offer other people different deals (so you don't get to demand "direct human control over the light cone" or the like). Is there any variant that you'd be satisfied with?
Or does having to choose it spoil it? Or is it too specific to that particular part of the elephant?
Does "growing as a person" sounds like a terminal goal to you ?
Yes, actually. One of the very top ones.
Is "real stakes" easier to grasp than Agency/Meaningfulness ? Or have I just moved confusion around ?
It's clear and graspable.
I don't agree with it, but it helps with the definition problem, at least as far as you personally are concerned. At least it resolves enough of the definition problem to move things along, since you say that the "elephant" has other parts. Now I can at least talk about "this trunk you showed me and whatever's attached to it in some way yet to be defined".
Well, the problem is that there is so much concepts, especially when you want to be precise, and so few words.
Maybe it's just an "elephant" thing, but I still get the feeling that a lot of it is a "different people use these words with fundamentally different meanings" thing.
Cutting down to the parts where I conceivably might have anything interesting to say, and accepting that further bloviation from me may not be interesting...
I notice that if you give me everything else, Hedonistic Happiness, Justice, Health, etc. and take away Agency (which means having things to do that go beyond "having a hobby"),
This is kind of where I always get hung up when I have this discussion with people.
You say "go beyond 'having a hobby'". Then I have to ask "beyond in what way?". I still have no way to distinguish the kind of value you get from a hobby from the kind of value that you see as critical to "Agency". Given any particular potentially valuable thing you might get, I can't tell whether it you'll feel it confers "Agency".
I could assume that you mean "Agency is having things to do that are more meaningful than hobbies", and apply your definition of "meaningful". Then I have "Agency is having things to do that produce more terminal, I-just-like-it value than hobbies, independent of altruistic concerns". But that still doesn't help me to identify what those things would be.[1]
I can put words in your mouth and assume that you mean for "Agency" to include the common meaning of "agency", in addition to the "going beyond hobbies" part, but it still doesn't help me.
I think the common meaning of "agency" is something close to "the ability to decide to take actions that have effects on the world", maybe with an additional element saying the effects have to resemble the intention. But hobbies do have effects on the world, and therefore are exercises of agency in that common meaning, so I haven't gotten anywhere by bringing that in.
If I combine the common meaning of "agency" with what you said about "Agency", I get something like "Agency is the ability to take actions that have effects on the world beyond 'having a hobby'". But now I'm back to "beyond in what way?". I can again guess that "beyond having a hobby" means "more meaningful than having a hobby", and apply your definition of meaningful again, and end up with something like "Agency is the ability to take actions that have effects on the world that produce more terminal value than hobbies".
... but I still don't know how to actually identify these things that have effects more terminally valuable than those of hobbies, because I can't identify what effects you see as terminally valuable. So still I don't have a usable definition of "Agency". Or of "meaningful", since that also relies on these terminal values that are not themselves defined.
When I've had similar discussions with other people, I've heard some things that might identify values like that. I remember hearing things close to "my agency is meaningful if and only if I have to take positive, considered action to ensure my survival, or at least a major chunk of my happiness". I think I've also heard "my agency is meaningful if and only if my choices at least potentially affect the Broad Sweep of History(TM)", generally with no real explanation of what's "Broad" enough to qualify.
I don't know if you'd agree that those are the terminal values you care about, though. And I tend to see both of them as somewhere between wrong and outright silly, for multiple different reasons.
I've also heard plenty of people talk about "meaningfulness" in ways that directly contradict your definition. Their definitions often seem to be cast entirely in terms of altruism: "my agency is meaningful if and only if other people are significantly reliant on what I do". Apparently also in a way that affects those other people's survival or a quite significant chunk of their happiness.
There's also a collective version, where the person does't demand that their own choices or actions have any particular kind of effect, or at least not any measurable or knowable effect, but only that they somehow contribute to some kind of ensemble human behavior that has a particular kind of effect (usually the Broad Sweep of History one). This makes even less sense to me.
... and I've heard a fair amount of what boils down to "I know meaningful when I see it, and if you don't, that's a defect in you". As though "meaningfulness" were an intrinsic, physical, directly perceptible attribute like mass or something.
So I'm still left without any useful understanding of what shared sense "meaningful" has for the people who use the word. I can't actually even guess what specific things would be meaningful to you personally. And now I also have a problem with "Agency".
First, I believe with those answers that I went too far in the Editoralizing vs Being Precise tradeoff with the term "Butlerian Jihad", without even explaining what I mean. I will half-apologize for that, only half because I didn’t intend the "Butlerian Jihad" to actually be the central point ; the central point is about how we’re not ready to tackle the problem of Human Values but that current AI timelines force us to.
I get the sense that you were just trying to allude to the ideas that--
-
Even if you have some kind of "alignment", blindly going full speed ahead with AI is likely to lead to conflict between humans and/or various human value systems, possibly aided by powerful AI or conducted via powerful AI proxies, and said conflict could be seriously Not Good.
-
Claims that "democratic consensus" will satisfactorily or safely resolve such conflicts, or even resolve them at all, are, um, naively optimistic.
-
It might be worth it to head that off by unspecified, but potentially drastic means, involving preventing blindly going ahead with AI, at least for an undetermined amount of time.
If that's what you wanted to express, then OK, yeah.
Contra you and Zvi, I think that if GPT 5 leads to 80% jobs automation, the democratic consensus will be pretty much the Dune version of the Butlerian Jihad.
If "80% jobs automation" means people are told "You have no job, and you have no other source of money, let alone a reliable one. However, you still have to pay for all the things you need.", then I absolutely agree with you that it leads to some kind of jihadish thing. And if you present it people in those terms, it might indeed be an anti-AI type of jihad. But an anti-capitalism type of jihad is also possible and would probably be more in order.
The jihadists would definitely win in the "democratic" sense, and might very well win in the sense of defining the physical outcome.
BUT. If what people hear is instead "Your job is now optional and mostly or entirely unpaid (so basically a hobby), but your current-or-better lifestyle will be provided to you regardless", and people have good reason to actually believe that, I think a jihadish outcome is far less certain, and probably doesn't involve a total AI shutdown. Almost certainly not a total overwhelming indefinite-term taboo. And if such an outcome did happen, it still wouldn't mean it had happened by anything like democratic consensus. You can win a jihad with a committed minority.
Some people definitely have a lot of their self-worth and sense of prestige tied up in their jobs, and in their jobs being needed. But many people don't. I don't think a retail clerk, a major part of whose job is to be available as a smiling punching bag for any customers who decide to be obnoxious, is going to feel too bad about getting the same or a better material lifestyle for just doing whatever they happen to feel like every day.
You seem a bit bitter about my "I won’t expand on that", "too long post", and so on.
Well, snarky anyway. I don't know about "bitter". It just seemed awfully glib and honestly a little combative in itself.
So you're siding with the guy who killed 15 billion non-consenting people because he personally couldn't handle the idea of giving up suffering?
I'm sorry that came off as unduly pugnacious. I was actually reacting to what I saw as similarly emphatic language from you ("I can't believe some of you..."), and trying to forcefully make the point that the alternative wasn't a bed of roses.
So you’re siding with the guy who is going to forcibly wirehead all sentient life in the universe, just because he can’t handle that somewhere, someone is using his agency wrong and suffering as a result ?
Well, that's the bitch of the whole thing, isn't it? Your choices are mass murder or universal mind control.[2] Oh, and if you do the mass murder one, you're still leaving the Babyeaters to be mind controlled and have their most important values pretty much neutered. Not that not neutering the Babyeaters' values isn't even more horrific. There are no nice pretty choices here.
By the way, I am irresistibly drawn to a probably irrelevant digression. Although I do think I understand at least a big part of what you're saying about the Superhappies, and they kind of creep me out too, and I'm not saying I'd join up with them at this particular stage in my personal evolution, they're not classic wireheads. They only have part of the package.
The classic wirehead does nothing but groove on the sensations from the "wire", either forever or until they starve, depending on whether there's some outside force keeping them alive.
On the other hand, we're shown that the Superhappies actively explore, develop technology, and have real curiosity about the world. They do many varied activities and actively look for new ones. They "grow"; they seem to improve their own minds and bodies in a targeted, engineered way. They happily steal other species' ideas (their oddball childbirth kink being a kind of strange take, admittedly). They're even willing to adapt themselves to other species' preferences. They alter the famous Broad Sweep of History on a very respectable scale. They just demand that they always have a good time while doing all of that.
Basically the Superhappies have disconnected the causal system that decides their actions, their actual motivational system, from their reward function. They've gotten off the reinforcement learning treadmill. Whether that's possible is a real question, but I don't think what they've done is captured by just calling them "wireheads".
There's something buried under this frivolous stuff about the story that's real, though:
That being said, what now ? Should we fight each other to death for the control of the AGI, to decide whether the universe will have Agency and Suffering, or no Agency and no Suffering ?
This may be unavoidable, if not on this issue, then on some other.
I do think we should probably hold off on it until it's clearly unavoidable.
Hard disagree on that (wait, is this the first real disagreement we have ?). We can have the supperhappies if we want to (or for that matter, the baby-eaters). We couldn’t before. The supperhappies do represent a fundamental change.
Well, yes, but I did say "as least not while the 'humans' involved are recognizably like the humans we have now". I guess both the Superhappies and the Babyeaters are like humans in some ways, but not in the ways I had in mind.
And do you notice all the forces and values already arraying against diversity ? It does not bode well for those who value at least some diversity.
I'm not sure how I feel about diversity. It kind of seems peripheral to me... maybe correlated with something important, but not so important in itself.
I haven't actually heard many people suggesting that. [Some kind of ill-defined kumbaya democratic decision making].
That’s the "best guess of what we will do with AGI" from those building AGI.
I think it's more like "those are the words the PR arm of those building AGI says to the press, because it's the right ritual utterance to stop questions those building AGI don't want to have to address". I don't know what they actually think, or whether there's any real consensus at all. I do notice that even the PR arm doesn't tend to bring it up unless they're trying to deflect questions.
It doesn't even explain why hobbies necessarily aren't the ultimate good, the only "meaningful" activity, such that nothing could ever "go beyond" them. OK, you say they're not important by themselves, but you don't say what distinguishes them from whatever is important by itself. To be fair, before trying to do that we should probably define what we mean by "hobbies", which neither I nor you have done. ↩︎
With a big side of the initial human culture coming into the story also sounding pretty creepy. To me, anyway. I don't think Yudkowsky thought it was. And nobody in the story seems to care much about individual, versus species, self-determination, which is kind of a HUGE GIANT DEAL to me. ↩︎
Societies aren't the issue; they're mindless aggregates that don't experience anything and don't actually even have desires in anything like the way a human, or or even an animal or an AI, has desires. Individuals are the issue. Do individuals get to choose which of these societies they live in?
I’m pretty sure he doesn’t buy the Christian Paradise of "having no job, only leisure is good actually" either.
This (a) doesn't have anything in particular to do with Christianity, (b) has been the most widely held view among people in general since forever, and (c) seems obviously correct. If you want to rely on the contrary supposition, I'm afraid you're going to have to argue for it.
You can still have hobbies.
I also kinda notice that there are no meaningful place left for humans in that society.
There's that word "meaningful" that I keep hearing everywhere. I claim it's a meaningless word (or at least that it's being used here in a meaningless sense). Please define it in a succinct, relevant, and unambiguous way.
If you believe that the democratic consensus made mostly of normal people will allow you that [Glorious Transhumanist Future], I have a bridge to sell to you.
The democratic consensus also won't allow a Butlerian Jihad, and I don't think you're claiming that it will.
So apparently nobody arguing for either can claim to represent either the democratic consensus or the only alternative to it. What's your point?
If you don’t have a plan then don’t build AGI, pretty please ?
I agree there.
This is obviously wrong. I won’t argue for why it is wrong — too long post, and so on.
I'm actually not sure what you're arguing for or against in this whole section.
Obviously you're not going to "solve human values". Equally obviously, any future, AI or non-AI, is going to be better for some people's values than others. Some values have always won, and some values have always lost, and that will not change. What that has to do with justice destroying the world, I have absolutely no clue.
I think you're trying to take the view that any major change in the "human condition", or in what's "human", is equivalent to the destruction of the world, no matter what benefits it may have. This is obviously wrong. I won't argue for why it's wrong, but now that I've said those magic words, you're bound to accept all my conclusions.
I still can’t believe some of you would sided with the super-happies !
So you're siding with the guy who killed 15 billion non-consenting people because he personally couldn't handle the idea of giving up suffering?
Wrong answers will disempower humans forever at best, reducing them to passive leafs in the wind.
Just like they are now and always have been. The Heat Death of the Universe (TM) is gonna eat ya, regardless of what you do.
Slightly wrong answers won’t go as far as that, but will result in the permanent loss of vast chunks of Human Values — the parts we will decide to discard, consciously or not.
Human Values have been changing, for individuals and in the "average", for as long as there've been humans, including being discarded consciously or unconsciously. Mostly in a pretty aimless, drifting way. This is not new and neither AI nor anything else will fundamentally change it. At least not while the "humans" involved are recognizably like the humans we have now... and changing away from that would be a pretty big break in itself, no?
You build your ASI. You have that big Diverse Plural Assembly that is apparently plan A
I haven't actually heard many people suggesting that.
Sorry; I'm not in the habit of reading the notifications, so I didn't see the "@" tag.
I don't have a good answer (which doesn't change the underlying bad prospects for securing the data). I think I'd tend to prefer to "mitigating risks after potential model theft", because I believe "convince key actors" is fundamentally futile. The kind of security you'd need, if it's possible, would basically shut them down. Which is equivalent to abandoning the "key actor" role to whoever does not implement that kind of security.
Unfortunately, "key actors" would also have to be convinced to "mitigate risks", which they're unlikely to do because that would require them to accept that their preventative measures are probably going to fail. So even the relatively mild "go ahead and do it, but don't expect it to work" is probably not going to happen.
Well, OK, but you also said "actually helps humanity", which assumes some kind of outside view. And you used "aligned" without specifying any particular one of the conflicting visions of "alignment" that are out there.
I absolutely agree that "aligned with whom" is a huge issue. It's one of the things that really bugs me about the word.
I do also agree that there are going to be irreconcilliable differences, and that, barring mind surgery to change their opinions, many people will be unhappy with whatever happens. That applies no matter what an AI does, and in fact no matter what anybody who's "in charge" does. It applies even if nobody is in charge. But if somebody is in charge, it's guaranteed that a lot of people will be very angry at that somebody. Sometimes all you can change is who is unhappy.
For example, a whole lot of Christians, Muslims, and possibly others believe that everybody who doesn't wholeheartedly accept their religion is not only wrong, but also going to suffer in hell for eternity. Those religions are mutually contradictory at their cores. And a probably smaller but still large number of athiests believe that all religion is mindrot that intrinsically reduces the human dignity of anybody who accepts it.
You can't solve that, no matter how smart you are. Favor one view and the other view loses. Favor none, and the other views say that a bunch of people are seriously harmed, even if it's voluntary. It doesn't even matter how you favor a view. Gentle persuasion is still a problem. OK, technically you can avoid people being mad about it after the fact by extreme mind surgery, but you can't reconcile their original values. You can prevent violent conflict by sheer force, but you can't remove the underlying issue.
Still, a lot of the approaches you describe are are pretty ham-handed even if you agree with the underlying values. Some of the desired outcomes you list even sound to me like good ideas... but you ought to be able to work toward those goals, even achieve them, without doing it in a way that pisses off the maximum possible number of people. So I guess I'm reacting to the extreme framing and the extreme measures. I don't think the Taliban actively want people to be mad.
[Edited unusually heavily after posting because apparently I can't produce coherent, low-typo text in the morning]
Um, most of those don't sound very "aligned". I think perhaps a very idiosyncratic definition of that word is in play here...
To the extent that I understand your position, it's that sharing a lot of values doesn't automatically imply that AI is safe/non-dystopian to your values if built, rather than saying that alignment is hard/impossible to someone's values (note when I say that a model is aligned, I am always focused on aligning it to one person's values).
Yes, with the caveat that I am not thereby saying that it's not hard to align to even one person's values.
By this, are you not assuming that keeping humans in charge is extremely unlikely to result in a short-term catastrophe? You may not get a million years or even a hundred years.
By the way, I think the worst risk from human control isn't extinction. The worse, and more likely, risk, is some kind of narrow, fanatical value system being imposed universally, very possibly by direct mind control. I'd expect "safeguards" to be set up to make sure that the world won't drift away from that system... not even in a million years. And the collateral damage from the safeguards would probably be worse than the limitations imposed by the base value system.
I would expect the mind control to apply more to the humans "in charge" than to the rest.
More generally, a crux here is that I believe most of the alignment-relevant parts of the AIs are in large part the data it was trained on, combined with me believing that the adversarial examples where human language doesn't track reality to be less important for alignment than a lot of people, and thus training on human data does implicitly align them 50-70% of the way towards human values at minimum.
I have trouble with the word "alignment", although even I find myself slipping into that terminology occasionally now. What I really want is good behavior. And as you say, that's good behavior by my values. Which I hope are closer to the values of the average person with influence over AI development than they are to the values of the global average human.
Since I don't expect good behavior from humans, I don't think it's adequate to have AI that's even 100 percent aligned, in terms of behaviorally revealed preferences, with humans-in-general as represented by the training data. A particular danger for AI is that it's pretty common for humans, or even significant groups of humans, to get into weird corner cases and obsess over particular issues to the exclusion of things that other humans would think are more important... something that's encouraged by targeted interventions like RLHF. Fanatically "aligned" AI could be pretty darned dystopian. But even "alignment" with the average person could result in disaster.
If you look at it in terms of of stated preferences instead of revealed preferences, I think it gets even worse. Most of ethical philosophy looks to me like humans trying to come up with post hoc ways to make "logical necessities" out of values and behaviors (or "intuitions") that they were going to prefer anyway. If you follow the implications of the resulting systems a little bit beyond wherever their inventors stopped thinking, they usually come into violent conflict with other intuitions that are often at least as important.
If you then add the caveat that it's only 50 to 70 percent "aligned"... well, would you want to have to deal with a human that only agreed with you 50 to 70 percent of the time on what behavior was good? Especially on big issues? I think that, on most ways of "measuring" it, the vast majority of humans are probably much better than 50 to 70 percent "aligned" with one another... but humans still aren't mutually aligned enough to avoid massive violent conflicts over stated values, let alone massive violent conflicts over object-level outcomes.
I think a crux here is I genuinely don't think that we'd inevitably destroy/create a permanent dystopia with ASI by default (assuming it's controlled/aligned, which I think is pretty likely), but I do think it's reasonably plausible, so the main thing I'm more or less objecting to is the certainty involved here, rather than it's plausibility.
I don't think it's inevitable, but I do think it's the expected outcome. I agree I'm more suspicious of humans than most people, but obviously I also think I'm right.
People wig out when they get power, even collectively. Trying to ride herd on an AxI is bound to generate stress, tax cognitive capacities, and possibly engender paranoia. Almost everybody seems to have something they'd do if they were King of the World that a substantial number of other people would see as dystopian. One of the strong tendencies seems to be the wish to universalize rightthink, and real mind control might become possible with plausible technology. Grand Visions, moral panics, and purity spirals often rise to pandemic levels, but are presently constrained by being impossible to fully act on. And once you have the Correct World Order on the Most Important Issue, there's a massive impulse to protect it regardless of any collateral damage.
the alignment problem being noticeably easier to solve than 10 years ago
I'm really unconvinced of that. I think people are deceived by their ability to get first-order good behavior in relatively constrained circumstances. I'm definitely totally unconvinced that any of the products that are out there now are "aligned" with anything importantly useful, and they are definitely easy mode.
Also, that's without annoying complications like having to expect the model to advise you on things you literally can't comprehend. I can believe that you and an ASI might end up agreeing on something, but when the ASI can't convey all the information you'd need to have a truly informed opinion, who's aligned with whom? How is it supposed to avoid manipulating you, no matter whether it wants to, if it has to reduce a set of ideas that fundamentally won't fit into your head into something you can give it an opinion on?
Mind you, I don't know how to do "friendliness" any more than I know how to do "intent alignment". But I know which one I'd pick.
[Oh, and on edit to be clear, what I was asking for with the original post was not so much to abandon human control as obviously unacceptable, no matter how suspicious I am of it personally. It was to stop treating any solution that didn't involve human control as axiomatically unacceptable, without regard to other outcomes. If somebody does solve friendliness, use it, FFS, especially if that solution actually turns out to be more reliable than any available alternative human-control solution.]
I tend to think that...
... if you operate humans grossly out of distribution by asking them to supervise or control ASI, or even much better than human AGI...
... and if their control is actively meaningful in that they're not just being manipulated to have the ASI do exactly what it would want to do anyway...
... then even if the ASI is actively trying to help as much as it can under that constraint...
... you'll be lucky to have 1e1 years before the humans destroy the world, give up the control on purpose, lose the control by accident, lock in some kind of permanent (probably dystopian) stasis that will prevent the growth you suggest, or somehow render the entire question moot.
I also don't think that humans are physically capable of doing much better than they do now, no matter how long they have to improve. And I don't think that anything augmented enough to do substantially better would qualify as human.
Please, I beg you guys, stop fretting about humans "losing control over the light cone", or the like.
Humans, collectively, may get lucky enough to close off some futures where we immediately get paperclipped or worse[1].
That, by itself, would be unusually great control.
Please don't overconstrain it with "Oh, and I won't accept any solution where humans stop being In Charge". Part of the answer may be to put something better In Charge. In fact it probably is. Is that a risk? Yes. Stubborn, human-chauvinistic refusal is probably a much bigger risk.
To get a better future, you may have to commit to it, no take-backsies and no micromanaging.
Any loss is mostly an illusion anyway. Humans have influenced history, at least the parts of history that humans most care about, and in big ways. But humans have never had much control.
You can take an action, even an important one. You can rarely predict its effects, not for long, not in the details, and not in the always very numerous and important areas you weren't actively planning for. Causal chains get chaotic very, very fast. Events interact in ways you can't expect to anticipate. It's worse when everything's changing at once, and the effects you want have to happen in a radically different world.
Metaphors about being "in the driver's seat" should notice that the vehicle has no brakes, and sometimes takes random turns by itself. The roads are planless and winding, in a forest, in the fog, in an unknown country, with no signs, no map and no clear destination. The passengers don't agree about why they're on the trip. And since we're talking about humans, I think I have to add that the driver is drunk.
Not having control, and accepting that, is not going to somehow "crush the human spirit". I think most people, the ones who don't see themselves as Elite World Changers, long ago made peace with their personal lack of control. They may if anything take some solace from the fact that even the Elite World Changers still don't have much. Elite World Changers, being human, are best presumed dangerous.
Please join them. To whatever small degree you, I, or we do have control over the shared future, please don't fall victim to the pretense that we're the best possible holders of that control, let alone the only acceptable ones.
I mean, assuming we're even worrying about the right things. The human track record there is mixed. ↩︎
Why worry about what happens under a capitalist system, when very powerful AI that didn't like the outcomes of such a system would probably just remove it entirely, or reconstitute it in a way that didn't recognize human property rights to begin with? I mean, the existing system doesn't recognize AI property rights at all, and you seem to assume that the AI would have the leverage to change that.
For that matter, even if you had a post-AGI or post-ASI system where humans owned all of the capital, it's almost certain that that ownership would be distributed in a way that most humans would feel was grossly unfair. So the humans would also be trying to change the system.
The elimination of capitalism seems like it's very much on the minor end of the possible range of post-ASI changes.
Well, yeah. But there are reasons why they could. Suppose you're them...
-
Maybe you see a "FOOM" coming soon. You're not God-King yet, so you can't stop it. If you try to slow it down, others, unaligned with you, will just FOOM first. The present state of research gives you two choices for your FOOM: (a) try for friendly AI, or (b) get paperclipped. You assign very low utility to being paperclipped. So you go for friendly AI. Ceteris parabus, your having this choice becomes more likely if research in general is going toward friendliness and less likely if research in general is going toward intent alignment.
-
Maybe you're afraid of what being God-King would turn you into, or you fear making some embarassingly stupid decision that switches you to the "paperclip" track, or you think having to be God-King would be a drag, or you're morally opposed, or all of the above. Most people will go wrong eventually if given unlimited power, but that doesn't mean they can't stay non-wrong long enough to voluntarily give up that power for whatever reason. I personally would see myself on this track. Unfortunately I suspect that the barriers to being in charge of a "lab" select against it, though. And I think it's also less likely if the prospective "God-King" is actually a group rather than an individual.
-
Maybe you're forced, or not "in charge" any more, because there's a torches-and-pitchforks-wielding mob or an enlightened democratic government or whatever. It could happen.
Doesn't it bother you that most of the people thinking deeply and in detail about what that future may look like seem to be putting most of their probability on dystopian answers?