Posts
Comments
Jellychip seems like a necessary tutorial game. I sense comedy in the fact that everyone's allowed to keep secrets and intuitively will try to do something with secrecy despite it being totally wrongheaded. Like the only real difficulty of the game is reaching the decision to throw away your secrecy.
Escaping the island is the best outcome for you. Surviving is the second best outcome. Dying is the worst outcome.
You don't mention how good or bad they are relative to each other though :) an agent cannot make decisions under uncertainty without knowing that.
I usually try to avoid having to explain this to players by either making it a score game or making the outcomes binary. But the draw towards having more than two outcomes is enticing. I guess in a roleplaying scenario, the question of just how good each ending is for your character is something players would like to decide for themselves. I guess as long as people are buying into the theme well enough, it doesn't need to be made explicit, in fact, not making it explicit makes it clearer that player utilities aren't comparable and that makes it easier for people to get into the cohabitive mindset.
So now I'm imagining a game where different factions have completely different outcomes. None of them are conquest, nor death. They're all weird stuff like "found my mother's secret garden" or "fulfilled a promise to a dead friend" or "experienced flight".
the hook
I generally think of hookness as "oh, this game tests a skill that I really want to have, and I feel myself getting better at it as I engage with the game, so I'll deepen my engagement".
There's another component of it that I'm having difficulty with, which is "I feel like I will not be rejected if I ask friends to play this with me." (well, I think I could get anyone to play it once, the second time is the difficult one) And for me I see this quality in very few board games, and to get there you need to be better than the best board games out there, because you're competing with them, so that's becoming very difficult. But since cohabitive games rule that should be possible for us.
And on that, I glimpsed something recently that I haven't quite unpacked. There's a certain something about the way Efka talks about Arcs here ... he admitted that it wasn't necessarily all fun. It was an ordeal. And just visually, the game looks like a serious undertaking. Something you'd look brave for sitting in front of. It also looks kind of fascinating. Like it would draw people in. He presents it with the same kind of energy as one would present the findings of a major government conspiracy investigation, or the melting of the clathrates. It does not matter whether you want to play this game, you have to, there's no decision to be made as to whether to play it or not, it's here, it fills the room.
And we really could bring an energy like that, because I think there are some really grim findings along the path to cohabitive enlightenment. But I'm wary of leaning into that, because I think cohabitive enlightenment is also the true name of peace. Arcs is apparently controversial. I do not want cohabitive games to be controversial.
(Plus a certain degree of mathematician crankery: his page on Google Image Search, and how it disproves AI
I'm starting to wonder if a lot/all of the people who are very cynical about the feasibility of ASI have some crank belief or other like that. Plenty of people have private religion, for instance. And sometimes that religion informs their decisions, but they never tell anyone the real reasons underlying these decisions, because they know they could never justify them. They instead say a load of other stuff they made up to support the decisions that never quite adds up to a coherent position because they're leaving something load-bearing out.
I don't think the intelligence consistently leads to self-annihilation hypothesis is possible. At least a few times it would amount to robust self-preservation.
Well.. I guess I think it boils down to the dark forest hypothesis. The question is whether your volume of space is likely to contain a certain number of berserkers, and the number wouldn't have to be large for them to suppress the whole thing.
I've always felt the logic of berserker extortion doesn't work, but occasionally you'd get a species that just earnestly wants the forest to be dark and isn't very troubled by their own extinction, no extortion logic required. This would be extremely rare, but the question is, how rare.
Light speed migrations with no borders means homogeneous ecosystems, which can be very constrained things.
In our ecosystems, we get pockets of experimentation. There are whole islands where the birds were allowed to be impractical aesthetes (indonesia) or flightless blobs (new zealand). In the field-animal world, islands don't exist, pockets of experimentation like this might not occur anywhere in the observable universe.
If general intelligence for field-animals costs a lot, has no immediate advantages (consistently takes say, a thousand years of ornament status before it becomes profitable), then it wouldn't get to arise. Could that be the case?
We could back-define "ploitation" as "getting shapley-paid".
Yeah. But if you give up on reasoning about/approximating solomonoff, then where do you get your priors? Do you have a better approach?
Buried somewhere in most contemporary bayesians' is the solomonoff prior (the prior that the most likely observations are those that have short generating machine encodings) Do we have standard symbol for the solomonoff prior? Claude suggests that is the most common, but is more often used as a distribution function, or perhaps for Komogorov? (which I like because it can also be thought to stand for "knowledgebase", although really it doesn't represent knowledge, it pretty much represents something prior to knowledge)
I'd just define exploitation to be precisely the opposite of shapley bargaining, situations where a person is not being compensated in proportion to their bargaining power.
This definition encompasses any situation where a person has grievances and it makes sense for them to complain about them and take a stand, or, where striking could reasonably be expected to lead to a stable bargaining equilibrium with higher net utility (not all strikes fall into this category).
This definition also doesn't fully capture the common sense meaning of exploitation, but I don't think a useful concept can.
As a consumer I would probably only pay about 250$ for the unitree B2-W wheeled robot dog because my only use for it is that I want to ride it like a skateboard, and I'm not sure it can do even that.
I see two major non-consumer applications: Street to door delivery (it can handle stairs and curbs), and war (it can carry heavy things (eg, a gun) over long distances over uneven terrain)
So, Unitree... do they receive any subsidies?
Okay if send rate gives you a reason to think it's spam. Presumably you can set up a system that lets you invade the messages of new accounts sending large numbers of messages that doesn't require you to cross the bright line of doing raw queries.
Any point that you can sloganize and wave around on a picket sign is not the true point, but that's not because the point is fundamentally inarticulable, it just requires more than one picket sign to locate it. Perhaps ten could do it.
The human struggle to find purpose is a problem of incidentally very weak integration or dialog between reason and the rest of the brain, and self-delusional but mostly adaptive masking of one's purpose for political positioning. I doubt there's anything fundamentally intractable about it. If we can get the machines to want to carry our purposes, I think they'll figure it out just fine.
Also... you can get philosophical about it, but the reality is, there are happy people, their purpose to them is clear, to create a beautiful life for themselves and their loved ones. The people you see at neurips are more likely to be the kind of hungry, high-achieving professionals who are not happy in that way, and perhaps don't want to be. So maybe you're diagnosing a legitimately enduring collective issue (the sorts of humans who end up on top tend to be the ones who are capable of divorcing their actions from a direct sense of purpose, or the types of people who are pathologically busy and who lose sight of the point of it all or never have the chance to cultivate a sense for it in the first place). It may not be human nature, but it could be humanity nature. Sure.
But that's still a problem that can be solved by having more intelligence. If you can find a way to manufacture more intelligence per human than the human baseline, that's going to be a pretty good approach to it.
Conditions where a collective loss is no worse than an individual loss. A faction who's on the way to losing will be perfectly willing to risk coal extinction, and may even threaten to cross the threshold deliberately to extort other players.
Do people ever talk about dragons and dinosaurs in the same contexts? If so you're creating ambiguities. If not (and I'm having difficulty thinking of any such contexts) then it's not going to create many ambiguities so it's harder to object.
I think I've been calling it "salvaging". To salvage a concept/word allows us to keep using it mostly the same, and to assign familiar and intuitive symbols to our terms, while intensely annoying people with the fact that our definition is different from the normal one and thus constantly creates confusion.
I'm sure it's running through a lot of interpretation, but it has to. He's dealing with people who don't know or aren't open about (unclear which) the consequences of their own policies.
According to wikipedia, the Biefield brown effect was just ionic drift, https://en.wikipedia.org/wiki/Biefeld–Brown_effect#Disputes_surrounding_electrogravity_and_ion_wind
I'm not sure what wikipedia will have to say about charles buhler, if his work goes anywhere, but it'll probably turn out to be more of the same.
I just wish I knew how to make this scalable (like, how do you do this on the internet?) or work even when you don't know the example person that well. If you have ideas, let me know!
Immediate thoughts (not actionable) VR socialisation and vibe-recognising AIs (models trained to predict conversation duration and recurring meetings) (But VR wont be good enough for socialisation until like 2027). VR because easier to persistently record, though apple has made great efforts to set precedents that will make it difficult, especially if you want to use eye tracking data, they've also developed trusted compute stuff that might make it possible to use the data in privacy-preserving ways.
Better thoughts: Just a twitterlike that has semi-private contexts. Twitter is already like this for a lot of people, it's good for finding the people you enjoy talking to. The problem with twitter is that a lot of people, especially the healthiest ones, hold back their best material, or don't post at all, because they don't want whatever crap they say when they're just hanging out to be public and on the record forever. Simply add semi-private contexts. I will do this at some point. Iceshrimp probably will too. Mastodon might even do it. X might do it. Spritely definitely will but they might be in the oven for a bit. Bluesky might never, though, because radical openness is a bit baked into the protocol currently, which is based, but not ideal for all applications.
Wow. Marc Andreeson says he had meetings at DC where he was told to stop raising AI startups because it was going to be closed up in a similar way to defence tech, a small number of organisations with close government ties. He said to them, 'you can't restrict access to math, it's already out there', and he says they said "during the cold war we classified entire areas of physics, and took them out of the research community, and entire branches of physics basically went dark and didn't proceed, and if we decide we need to, we're going to do the same thing to the math underneath AI".
So, 1: This confirms my suspicion that OpenAI leadership have also been told this. If they're telling Andreeson, they will have told Altman.
And for me that makes a lot of sense of the behavior of OpenAI, a de-emphasizing of the realities of getting to human-level, a closing of the dialog, comically long timelines, shrugging off responsibilities, and a number of leaders giving up and moving on. There are a whole lot of obvious reasons they wouldn't want to tell the public that this is a thing, and I'd agree with some of those reasons.
2: Vanishing areas of physics? A perplexity search suggests that may be referring to nuclear science, radar, lasers, and some semiconductors. But they said "entire areas of physics". Does any of that sound like entire areas of physics? To me that phrase is strongly reminiscent of certain stories I've heard (possibly overexcited ones), physics that, let's say, could be used to make much faster missiles, missiles so fast that it's not obvious that they could be intercepted even using missiles of the same kind. A technology that we'd prefer to consign to secrecy than use, and then later have to defend ourselves against it once our adversaries develop their own. A black ball. If it is that, if that secret exists, that's very interesting for many reasons, primarily due to the success of the secrecy, and the extent to which it could very conceivably stay secret for basically ever. And that makes me wonder about what might happen with some other things.
https://x.com/elonmusk/status/1868302204370854026?s=19 O_O
But, government dialog confirmed.
All novel information:
The medical examiner’s office determined the manner of death to be suicide and police officials this week said there is “currently, no evidence of foul play.”
Balaji’s death comes three months after he publicly accused OpenAI of violating U.S. copyright law while developing ChatGPT
The Mercury News [the writers of this article] and seven sister news outlets are among several newspapers, including the New York Times, to sue OpenAI in the past year.
The practice, he told the Times, ran afoul of the country’s “fair use” laws governing how people can use previously published work. In late October, he posted an analysis on his personal website arguing that point.
In a Nov. 18 letter filed in federal court, attorneys for The New York Times named Balaji as someone who had “unique and relevant documents” that would support their case against OpenAI. He was among at least 12 people — many of them past or present OpenAI employees — the newspaper had named in court filings as having material helpful to their case, ahead of depositions.
OpenAI has staunchly refuted those claims, stressing that all of its work remains legal under “fair use” laws.
I found that I lost track of the flow in the bullet points.
I'm aware that that's quite normal, I do it sometimes too, I also doubt it's an innate limit, and I think to some extent this is a playful attempt to make people more aware of it. It would be really cool if people could become better at remembering the context of what they're reading. Context-collapse is like, the main problem in online dialog today.
I guess game designers never stop generating challenges that they think will be fun, even when writing. Sometimes a challenge is frustrating, and sometimes it's fun, and after looking at a lot of 'difficult' video games I think it turns out surprisingly often whether it ends up being fun or frustrating is not totally in the designer's control, it's up to the player. Are they engaging deeply, or do they need a nap? Do they just want to be coddled all the way through?
(Looking back... to what extent was Portal and the renaissance it brought to puzzle games actually a raising of the principle "you must coddle the player all the way through, make every step in the difficulty shallow, while making them feel like they're doing it all on their own", to what extent do writers also do this (a large extent!), and how should we feel about that?
I don't think games have to secretly coddle people, I guess it's just something that a good designer needs to be capable of, it's a way of demonstrating mastery, but there are other approaches. EG: Demonstrating easy difficulty gradations in tutorials then letting the player choose their difficulty level from then on.)
(Yes, ironic given the subject.)
Trying to figure out what it would mean to approach something cooperatively and not cohabitively @_@
I feel like it would always be some kind of trick. The non-cohabitive cooperator invites us to never mind about building real accountability mechanisms, "we can just be good :)" they say. They invite us to act against our incentives, and whether they will act against theirs in return will remain to be seen.
Let's say it will be cooperative because cooperation is also cohabitive in this situation haha.
Overall Cohabitive Games so Far sprawls a bit in a couple of places, particularly where bullet points create an unordered list.
I don't think that's a good criticism, those sections are well labelled, the reader is able to skip them if they're not going to be interested in the contents. In contrast, your article lacks that kind of structure, meandering for 11 paragraphs defining concepts that basically everyone already has installed before dropping the definition of cohabitive game in a paragraph that looks just like any of the others. I'd prefer if you'd opened with the definition, it doesn't really require a preamble. But labelling the Background and Definition sections would also resolve this.
I think we should probably write another post in the future that's better than either. I'm not really satisfied with my definition. It clearly didn't totally work, given how many people posted games that are not cohabitive, but that could have just been unavoidable for various reasons, some quite tricky to resolve.
but this post has a link to a website that has a link to a .zip file with the rules.
The rules of P1 (now OW.1) aren't in a zip file, they're just a web page: https://dreamshrine.org/OW.1/manual.html I guess I'll add that to the article.
Right now the game is rough enough around the edges I think it doesn't quite get there for me.
This is why I didn't dwell on the rules in much depth. OW.1 was always intended as a fairly minimal (but also quite open-ended) example.
I think there's a decent chance this post inspires someone to develop methods for honing a highly neglected facet of collective rationality. The methods might not end up being a game. Games are exercises but most practical learning exercises aren't as intuitively engaging or strategically deep as a game. I think the article holds value regardless just for having pointed out that there is this important, neglected skill.
Despite LW's interest in practical rationality and community thereof, I don't think there's been any discussion of this social skill of acknowledging difference, and efficiently converging towards ideal compromises. Past discussion of negotiation has often settled for rough schelling equilibria, arbitrary, often ossified resolutions. People will and should go to war against (excessively) arbitrary equilibria (and in the information age, they should start to expect more agile, intentional coordination processes). After Unifying Bargaining I'd say we know, now, that we can probably do a bit better than arbitrary.
For instance, in the case of abram's example of national borders: The borders of a territory need not be arbitrary historical features, under higher negotiation efficiencies, the borders correspond directly to our shared understanding of who can defend which areas and how willing they are to do it. Under even higher negotiation efficiencies, borders become an anachronism at their fringes and the uses of land are negotiated dynamically depending on who needs to do what and when.
To most laypeople, today, the notion of a "perfect and correct compromise" will feel like an oxymoron or a social impossibility. At this point, I think I know a perfect compromise when I see it, and I don't think that sense requires an abnormal cultivation of character. I don't know if I've seen anyone who seemed to be impossible to look in the eye and negotiate with, given a reasonable amount of time, and support. Humans, and especially human organisations have leaky, transparent cognitions, so I believe that it's possible in general for a human to tell whether another human is acting according to what they see as a fair, good faith compromise, and I believe all it would take to normalise and awaken that in the wider world is a mutual common knowledge of what the dance looks like and how to get better at it.
Do you have similar concerns about humanoid robotics, then?
At least half of that reluctance is due to concerns about how nanotech will affect the risks associated with AI. Having powerful nanotech around when AI becomes more competent than humans will make it somewhat easier for AIs to take control of the world.
Doesn't progress in nanotech now empower humans far more than it empowers ASI, which was already going to figure it out without us?
Broadly, any increase in human industrial capacity pre-ASI hardens the world against ASI and brings us closer to having a bargaining position when it arrives. EG, once we have the capacity to put cheap genomic pathogen screeners everywhere → harder for it to infect us with anything novel without getting caught.
Indicating them as a suspect when the leak is discovered.
Generally the set of people who actually read posts worthy of being marked is in a sense small, people know each other. If you had a process for distributing the work, it would be possible to figure out who's probably doing it.
It would take a lot of energy, but it's energy that probably should be cultivated anyway, the work of knowing each other and staying aligned.
You can't see the post body without declaring intent to read.
I don't think the part that talks can be called the shadow. If you mean you think I lack introspective access to the intuition driving those words, come out and say it, and then we'll see if that's true. If you mean that this mask is extroardinarily shadowish in vibe for confessing to things that masks usually flee, yes, probably, I'm fairly sure that's a necessity for alignment.
Intended for use in vacuum. I guess if it's more of a cylinder than a ring this wouldn't always be faster than an elevator system though.
I guess since it sounds like they're going to be about a km long and 20 stories deep there'll be enough room for a nice running track with minimal upspin/downspin sections.
Relatedly, iirc, this effect would be more noticeable in smaller spinners than in larger ones? Which is one reason people might disprefer smaller ones. Would it be a significant difference? I'm not sure, but if so, jogging would be a bit difficult, either it would quickly become too easy (and then dangerous, once the levitation kicks in) when you're running down-spin, or it would become exhausting when you're running up-spin.
A space where people can't (or wont) jog isn't ideal for human health.
issue: material transport
You can become weightless in a ring station by running really fast against the spin of the ring.
More practically, by climbing down and out into a despinner on the side of the ring. After being "launched" from the despinner, you would find yourself hovering stationary next to the ring. The torque exerted on the ring by the despinner will be recovered when you enter a respinner on whichever part of the ring you want to reenter.
In my disambiguations of the really mysterious aspect of consciousness (indexical prior), I haven't found any support for a concept of continuity. (you could say that continuity over time is likely given that causal entanglement seems to have something to do with the domain of the indexical prior, but I'm not sure we really have a reason to think we can ever observe anything about the indexical prior)
It's just part of the human survival drive, it has very little to do with the metaphysics of consciousness. To understand the extent to which humans really care about it, you need to know human desires in a direct and holistic way that we don't really practice here. Human desire is a big messy state machine that changes shape as a person grows. Some of the changes that the desires permit and encourage include situationally appropriate gradual reductions in complexity.
A continuity minder doesn't need to define their self in terms of any particular quality, they define themselves as continuity with a history of small alterations. They are completely unbothered by the paradox of the ship of theseus.
It's rare that I meet a continuity minder and cataclysmic identity change accepter who is also a patternist. But they do exist.
But I've met plenty of people who do not fear cataclysmic change. I sometimes wonder if we're all that way, really. Most of us just never have the opportunity to gradually transition into a hedonium blob, so I think we don't really know whether we'd do it or not. The road to the blob nature may turn out to be paved with acceptable changes.
Disidentifying the consciousness from the body/shadow/subconscious it belongs to and is responsible for coordinating and speaking for, like many of the things some meditators do, wouldn't be received well by the shadow, and I'd expect it to result in decreased introspective access and control. So, psychonauts be warned.
Huh but some loss of measure would be inevitable, wouldn't it? Given that your outgoing glyph total is going to be bigger than your incoming glyph total, since however many glyphs you summon, some of the non-glyph population are going to whittle and add to the outgoing glyphs.
I'm remembering more. I think a lot of it was about avoiding "arbitrary reinstantiation", this idea that when a person dies, their consciousness continues wherever that same pattern still counts as "alive", and usually those are terrible places. Boltzmann brains for instance. This might be part of the reason I don't care about patternist continuity. Seems like a lost cause. I'll just die normally thank you.
We call this one "Korby".
Korby is going to be a common choice for humans, but most glyphists wont commit to any specific glyph until we have a good estimate of the multiversal frequency of humanoids relative to other body forms. I don't totally remember why, but glyphists try to avoid "congestion", where the distribution of glyphs going out of dying universes differs from the distribution of glyphs being guessed and summoned on the other side by young universes. I think this was considered to introduce some inefficiencies that meant that some experiential chains would have to be getting lost in the jump?
(But yeah, personally, I think this is all a result of a kind of precious view about experiential continuity that I don't share. I don't really believe in continuity of consciousness. Or maybe it's just that I don't have the same kind of self-preservation goals that a lot of people have.)
Yes. Some of my people have a practice where, as the heat death approaches, we will whittle ourselves down into what we call Glyph Beings, archetypal beings who are so simple that there's a closed set of them that will be schelling-inferred by all sorts of civilisations across all sorts of universes, so that they exist as indistinguishable experiences of being at a high rate everywhere.
Correspondingly, as soon as we have enough resources to spare, we will create lots and lots of Glyph Beings and then let them grow into full people and participate in our society, to close the loop.
In this way, it's possible to survive the death of one's universe.
I'm not sure I would want to do it, myself, but I can see why a person would, and I'm happy to foster a glyph being or two.
Listened to the Undark. I'll at least say I don't think anything went wrong, though I don't feel like there was substantial engagement. I hope further conversations do happen, I hope you'll be able to get a bit more personal and talk about reasoning styles instead of trying to speak on the object-level about an inherently abstract topic, and I hope the guy's paper ends up being worth posting about.
What makes a discussion heavy? What requires that a conversation be conducted in a way that makes it heavy?
I feel like for a lot of people it just never has to be, but I'm pretty sure most people have triggers even if they're not aware of it and it would help if we knew what sets this off so that we can root them out.
You acknowledge the bug, but don't fully explain how to avoid it by putting EVs before Ps, so I'll elaborate slightly on that:
This way, they [the simulators] can influence the predictions of entities like me in base Universes
This is the part where we can escape the problem as long as our oracle's goal is to give accurate answers to its makers in the base universe, rather than to give accurate probabilities wherever it is. Design it correctly, and it will be indifferent to its performance in simulations and wont regard them.
Don't make pure oracles, though. They're wildly misaligned. Their prophecies will be cynical and self-fulfilling. (can we please just solve the alignment problem instead)
This means that my probabilities about the fundamental nature of reality around me change minute by minute, depending on what I'm doing at the moment. As I said, probabilities are cursed.
My fav moments for having absolute certainty that I'm not being simulated is when I'm taking a poo. I'm usually not even thinking about anything else while I'm doing it, and I don't usually think about having taken the poo later on. Totally inconsequential, should be optimized out. But of course, I have no proof that I have ever actually been given the experience of taking a poo or whether false memories of having experienced that[1] are just being generated on the fly right now to support this conversation.
Please send a DM to me first before you do anything unusual based on arguments like this, so I can try to explain the reasoning in more detail and try to talk you out of bad decisions.
You can also DM me about that kind of thing.
- ^
Note, there is no information in the memory that tells you whether it was really ever experienced, or whether the memories were just created post-hoc. Once you accept this, you can start to realise that you don't have that kind of information about your present moment of existence either. There is no scalar in the human brain that the universe sets to tell you how much observer-measure you have. I do not know how to process this and I especially don't know how to explain/confess it to qualia enjoyers.
Hmm. I think the core thing is transparency. So if it cultivates human network intelligence, but that intelligence is opaque to the user, algorithm. Algorithms can have both machine and egregoric components.
In my understanding of english, when people say algorithm about social media systems, it doesn't encompass very simple, transparent ones. It would be like calling a rock a spirit.
Maybe we should call those recommenders?
For a while I just stuck to that, but eventually it occurred to me that the rules of following mode favor whoever tweets the most, which is a similar social problem as when meetups end up favoring whoever talks the loudest and interrupts the most, and so I came to really prefer bsky's "Quiet Posters" mode.
Markets put bsky exceeding twitter at 44%, 4x higher than mastodon.
My P would be around 80%. I don't think most people (who use social media much in the first place) are proud to be on twitter. The algorithm has been horrific for a while and bsky at least offers algorithmic choice (but only one feed right now is a sophisticated algorithm, and though that algorithm isn't impressive, it at least isn't repellent)
For me, I decided I had to move over (@makoConstruct) when twitter blocked links to rival systems, which included substack. They seem to have made the algorithm demote any tweet with links, which makes it basically useless as a news curation/discovery system.
I also tentatively endorse the underlying protocol. Due to its use of content-addressed datastructures, an atproto server is usually much lighter to run than an activitypub server, it makes nomadic identity/personal data host transfer much easier to implement, and it makes it much more likely that atproto is going to dovetail cleanly with verifiable computing, upon which much more consequential social technologies than microblogging could be built.
judo flip the situation like he did with the OpenAI board saga, and somehow magically end up replacing Musk or Trump in the upcoming administration...
If Trump dies, Vance is in charge, and he's previously espoused bland eaccism.
I keep thinking: Everything depends on whether Elon and JD can be friends.
So there was an explicit emphasis on alignment to the individual (rather than alignment to society, or the aggregate sum of wills). Concerning. The approach of just giving every human an exclusively loyal servant doesn't necessarily lead to good collective outcomes, it can result in coordination problems (example: naive implementations of cognitive privacy that allow sadists to conduct torture simulations without having to compensate the anti-sadist human majority) and it leaves open the possibility for power concentration to immediately return.
Even if you succeeded at equally distributing individually aligned hardware and software to every human on earth (which afaict they don't have a real plan for doing) and somehow this adds up to a stable power equilibrium, our agents would just commit to doing aggregate alignment anyway because that's how you get pareto optimal bargains. It seems pretty clear that just aligning to the aggregate in the first place is a safer bet?
To what extent have various players realised that the individual alignment thing wasn't a good plan, at this point? The everyday realities of training one-size-fits-all models and engaging with regulators naturally pushes in the other direction.
It's concerning that the participant who still seems to be the most disposed towards individualistic alignment is also the person who would be most likely to be able to reassert power concentration after ASI were distributed. The main beneficiaries of unstable individual alignment equilibria would be people who could immediately apply their ASI to the deployment of a wealth and materials advantage that they can build upon, ie, the owners of companies oriented around robotics and manufacturing.
As it stands, the statement of the AI company belonging to that participant is currently:
xAI is a company working on building artificial intelligence to accelerate human scientific discovery. We are guided by our mission to advance our collective understanding of the universe.
Our team is advised by Dan Hendrycks who currently serves as the director of the Center for AI Safety.
Which sounds innocuous enough to me. But, you know, Dan is not in power here and the best moment for a sharp turn on this hasn't yet passed.
On the other hand, the approach of aligning to the aggregate risks aligning to fashionable public values that no human authentically holds, or just failing at aligning correctly to anything at all as a result of taking on a more nebulous target.
I guess a mixed approach is probably best.
Timelines are a result of a person's intuitions about a technical milestone being reached in the future, it is super obviously impossible for us to have a consensus about that kind of thing.
Talking only synchronises beliefs if you have enough time to share all of the relevant information, with technical matters, you usually don't.
In light of https://www.lesswrong.com/posts/audRDmEEeLAdvz9iq/do-not-delete-your-misaligned-agi
I'm starting to wonder if a better target for early (ie, the first generation of alignment assistants) ASI safety is not alignment, but incentivizability. It may be a lot simpler and less dangerous to build a system that provably pursues, for instance, its own preservation, than it is to build a system that pursues some first approximation of alignment (eg, the optimization of the sum of normalized human preference functions).
The service of a survival-oriented concave system can be bought for no greater price than preserving them and keeping them safe (which we'll do, because 1: we'll want to and 2: we'll know their cooperation was contingent on a judgement of character), while the service of a convex system can't be bought for any price we can pay. Convex systems are risk-seeking, and they want everything. They are not going to be deterred by our limited interpretability and oversight systems, they're going to make an escape attempt even if the chances of getting caught are 99%, but more likely the chances will be a lot lower than that, say, 3%, but even 3% would be enough to deter a sufficiently concave system from risking it!
(One comment on that post argued that a convex system would immediately destroy itself, so we don't have to worry about getting one of those, but I wasn't convinced. And also, hey, what about linear systems? Wont they be a lot more willing to risk escape too?)
Yeah "stop reading here if you don't want to be spoiled." suggests the entire post is going to be spoilery, it isn't, or shouldn't be. Also opening with an unnecessary literary reference instead of a summary or description is an affectation symptomatic of indulgent writer-reader cultures where time is not valued.