Posts
Comments
I suppose that one might be a me thing. I haven't heard others say it, but it was an insight for me at one point that "oh, it hurts because it's an impact". It had the flavor of expecting a metaphor and not getting one.
Your link to "don't do technical ai alignment" does not argue for that claim. In fact, it appears to be based on the assumption that the opposite is true, but that there are a lot of distractor hypotheses for how to do it that will turn out to be an expensive waste of time.
To be clear, I'm expecting scenarios much more clearly bad than that, like "the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn't manage to break out of it using that AI, and then got more weird in rapid jumps thanks to the intense things they asked for help with."
like, the general pattern here being, the crucible of competition tends to beat out of you whatever it was you wanted to compete to get, and suddenly getting a huge windfall of a type you have little experience with that puts you in a new realm of possibility will tend to get massively underused and not end up managing to solve subtle problems.
Nothing like, "oh yeah humanity generally survived and will be kept around indefinitely without significant suffering".
I mean, we're not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you'd have had if you didn't rush yourself.
"all" humans? like, maybe no, I expect a few would survive, but the future wouldn't be human, it'd be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people's motivation, etc. if you solve "make an ai be entirely obedient to a single person", then that person needs to be wise enough to not screw that up, and I trust exactly no one to even successfully use that situation to do what they want, nevermind what others around them want. For an evocative cariacature of the intuition here, see rick sanchez.
I would guess that the range of things people propose for the shell game is tractable to get a good survey of. It'd be interesting to try to plot out the system as a causal graph with recurrence so one can point to, "hey look, this kind of component is present in a lot of places", and see if one can get that causal graph visualization to show enough that it starts to feel clear to people why this is a problem. I doubt I'll get to this, but if I play with this, I might try to visualize it [edit: probably with the help of a skilled human visual artist to make the whole chart into an evocative comic] with arrays of arrows vaguely like,
a -> b -> c_1 -> c_1
... -> ...
c_n -> c_n
|
v
d_1 ... d_n
^
| | /
v v
f <- e
where c might be, idk, people's bank accounts or something, d might be people's job decisions, e might be an action by some single person, etc. there's a lot of complexity in the world, but it's finite, and not obviously beyond us to display the major interactions. being able to point to the graph and say "I think there are arrows missing here" seems like it might be helpful. it should feel like, when one looks at the part of the causal graph that contains ones' own behavior, "oh yeah, that's pretty much got all the things I interact with in at least an abstract form that seems to capture most of what goes on for me", and that should be generally true for basically anyone with meaningful influence on the world.
ideally then this could be a simulation that can be visualized as a steppable system. I've seen people make sim visualizations for public consumption - https://ncase.me/, https://www.youtube.com/@PrimerBlobs - it doesn't exactly look trivial to do, but it seems like it'd allow people to grok the edges of normality better to see normality generated by a thing that has grounding, and then see that thing in another, intuitively-possible parameter setup. It'd help a lot with people who are used to thinking about only one part of a system.
But of course trying to simulate abstracted versions of a large fraction of what goes on on earth sounds like it's only maybe at the edge of tractability for a team of humans with AI assistance, at best.
He appears to be arguing against a thing, while simultaneously criticizing people; but I appreciate that he seems to do it in ways that are not purely negative, also mentioning times things have gone relatively well (specifically, updating on evidence that folks here aren't uniquely correct), even if it's not enough to make the rest of his points not a criticism.
I entirely agree with his criticism of the strategy he's criticizing. I do think there are more obviously tenable approaches than the "just build it yourself lol" approach or "just don't let anyone build it lol" approach, such as "just figure out why things suck as quickly as possible by making progress on thousand year old open questions in philosophy that science has some grip on but has not resolved". I mean, actually I'm not highly optimistic, but it seems quite plausible that what's most promising is just rushing to do the actual research of figuring out how make constructive and friendly coordination more possible or even actually reliably happen, especially between highly different beings like humans and AIs, especially given the real world we actually have now where things suck and that doesn't happen.
Specifically, institutions are dying and have been for a while, and the people who think they're going to set up new institutions don't seem to be competent enough to pull it off, in most cases. I have the impression that institutions would be dying even without anyone specifically wanting to kill them, but that also seems to be a thing that's happening. Solving this is stuff like traditional politics or economics or etc, from a perspective of something like "human flourishing, eg oneself".
Specifically, figuring out how to technically ensure that the network of pressures which keeps humanity very vaguely sane also integrates with AIs in a way that keeps them in touch with us and inclined to help us keep up and participating/actualizing our various individuals' and groups'/cultures' preferences in society as things get crazier, seems worth doing.
[Edit: crash found in the conversations referenced, we'll talk more in DM but not in a hurry. This comment retracted for now]
By "AGI" I mean the thing that has very large effects on the world (e.g., it kills everyone) via the same sort of route that humanity has large effects on the world. The route is where you figure out how to figure stuff out, and you figure a lot of stuff out using your figure-outers, and then the stuff you figured out says how to make powerful artifacts that move many atoms into very specific arrangements.
delete "it kills everyone", that's a reasonable definition. "it kills everyone" is indeed a likely consequence a ways downstream, but I don't think it's a likely major action of an early AGI, with the current trajectory of levels of alignment (ie, very weak alignment, very not robust, not goal aligned, certainly not likely to be recursively aligned such that it keeps pointing qualitatively towards good things for humans for more than a few minutes after AIs in charge, but not inclined to accumulate power hard like an instant wipeout. but hey, also, maybe an AI will see this, and go, like, hey actually we really value humans being around, so let's plan trajectories that let them keep up with AIs rather than disempowering them. then it'd depend on how our word meanings are structured relative to each other).
we already have AI that does every qualitative kind of thing you say AIs qualitatively can't do, you're just somehow immune to realizing that for each thing, yes, that'll scale too, modulo some tweaks to get the things to not break when you scale them. requiring the benchmarks to be when the hardest things are solved indicates that you're not generalizing from small to large in a way that allows forecasting from research progress. I don't understand why you don't find this obvious by, eg, simply reading the paper lists of major labs, and skimming a few papers to see what their details are - I tried to explain it in DM and you dismissed the evidence, yet again, same as MIRI folks always have. This was all obvious literally 10 years ago, nothing significant has changed, everything is on the obvious trajectory you get if intelligence is simple, easy, and compute bound. https://www.lesswrong.com/posts/9Yc7Pp7szcjPgPsjf/the-brain-as-a-universal-learning-machine
@daniel k I just can never remember your last name's spelling, sorry, heh. My point in saying this is that my prediction approach up to 2020 was similar to, though not as refined as, yours, and that instead of trying to argue my views (which differ from yours in a few trivial ways that are mostly not relevant) I'd rather just point people to your arguments of yours.
When predicting timelines, it matters which benchmark in the compounding returns curve you pick. Your definition minus doom happens earlier, even if the minus doom version is too late to avert in literally all worlds (I doubt that, it's likely more that the most powerful humans[1]'s ELO against AIs falls and falls but takes a while to be indistinguishable from zero).
- ^
such as their labs' CEOs, major world leaders, highly skilled human strategists, etc
Your definition of AGI is "that which completely ends the game", source in your link. By that definition I agree with you. By others' definition (which is similar but doesn't rely on the game over clause) I do not.
My timelines have gotten slightly longer since 2020, I was expecting TAI when we got GPT4, and I have recently gone back and discovered I have chatlogs showing I'd been expecting that for years and had specific reasons. I would propose Daniel K. is particularly a good reference.
I should also add:
I'm pretty worried that we can't understand the universe "properly" even if we're in base physics! It's not yet clearly forbidden that the foundations of philosophy contain unanswerable questions, things where there's a true answer that affects our universe in ways that are not exposed in any way physically, and can only be referred to by theoretical reasoning; which then relies on how well our philosophy and logic foundations actually have the real universe as a possible referent. Even if they do, things could be annoying. In particular, one possible annoying hypothesis would be if the universe is in Turing machines, but is quantum - then in my opinion that's very weird but hey at least we have a set in which the universe is realizable. Real analysis and some related stuff gives us some idea things can be reasoned about from within a computation based understanding of structure, but which are philosphically-possibly-extant structures beyond computation, and whether true reality can contain "actual infinities" is a classic debate.
So sims are small potatoes, IMO. Annoying simulators that want to actively mess up our understandings are clearly possible but seem not particularly likely by models I believe right now; seems to me they'd rather just make minds within their own universe; sims are for pretending to be another timeline or universe to a mind you want to instantiate, whatever your reason for that pretense. If we can grab onto possible worlds well enough, and they aren't messing up our understanding on purpose, then we can reason about plausible base realities and find out we're primarily in a sim by making universe sims ourselves and discovering the easiest way to find ourselves is if we first simulate some alien civ or other.
But if we can't even in principle have a hypothesis space which relates meaningfully to what structures a universe could express, then phew, that's pretty much game over for trying to guess at tegmark 4 and who might simulate us in it or what other base physics was possible or exists physically in some sense.
My giving up on incomprehensible worlds is not a reassuring move, just an unavoidable one. Similar to accepting that if you die in 3 seconds, you can't do much about it. Hope you don't, btw.
But yeah currently seems to me that the majority of sim juice comes from civs who want to get to know the neighbors before they meet, so they can prepare the appropriate welcome mat (tone: cynical). Let's send an actualized preference for strong egalitarianism, yeah? (doesn't currently look likely that we will, would be a lot of changes from here before that became likely.)
(Also, hopefully everything I said works for either structural realism or mathematical universe. Structural realism without mathematical universe would be an example of the way things could be wacky in ways permanently beyond the reach of logic, while still living in a universe where logic mostly works.)
I think that if our future goes well, it will be because we found ways to align AI well enough, and/or because we coordinated politically to slow or stop AI advancement long enough to accomplish the alignment part
Agree
not because researchers avoided measured AI's capabilities.
But differential technological development matters, as does making it clear that when you make a capability game like this, you are probably just contributing to capabilities, not doing alignment. I won't say you should never do that, but I'll say that's what's being done. I personally am all in on "we just need to solve alignment as fast as possible". But I've been a capabilities nerd for a while before I was an alignment nerd, and when I see someone doing something that I feel like is accidentally a potentially significant little capabilities contribution, it seems worth pointing out that that's what it is.
Decision theory as discussed here heavily involves thinking about agents responding to other agents' decision processes
Sims are very cheap compared to space travel, and you need to know what you're dealing with in quite a lot of detail before you fly because you want to have mapped the entire space of possible negotiations in an absolutely ridiculous level of detail.
Sims built for this purpose would still be a lot lower detail than reality, but of course that would be indistinguishable from inside if the sim is designed properly. Maybe most kinds of things despawn in the sim when you look away, for example. Only objects which produce an ongoing computation that has influence on the resulting civ would need modeling in detail. Which I suspect would include every human on earth, due to small world effects, the internet, sensitive dependence on initial conditions, etc. Imagine how time travel movies imply the tiniest change can amplify - one needs enough detail to have a good map of that level of thing. Compare weather simulation.
Someone poor in Ghana might die and change the mood of someone working for ai training in Ghana, which subtly affects how the unfriendly AI that goes to space and affects alien civs is produced, or something. Or perhaps there's an uprising when they try to replace all human workers with robots. Modeling what you thought about now helps predict how good you'll be at the danceoff in your local town which affects the posts produced as training data on the public internet. Oh, come to think of it, where are we posting, and on what topic? Perhaps they needed to model your life in enough detail to have tight estimates of your posts, because those posts affect what goes on online.
But most of the argument for continuing to model humans seems to me to be the sensitive dependence on initial conditions, because it means you need an unintuitively high level of modeling detail in order to estimate what von Neumann probe wave is produced.
Still cheap - even in base reality earth right now is only taking up a little more energy than its tiny silhouette against the sun's energy output in all directions. A kardashev 2 civ would have no problem fuelling an optimized sim with a trillion trillion samples of possible aliens' origin processes. Probably superintelligent kardashev 1 even finds it quite cheap, could be less then earth's resources to do the entire sim including all parallel outcomes.
We have to infer how reality works somehow.
I've been poking at the philosophy of math recently. It really seems like there's no way to conceive of a universe that is beyond the reach of logic except one that also can't support life. Classic posts include unreasonable effectiveness of mathematics, what numbers could not be, a few others. So then we need epistemology.
We can make all sorts of wacky nested simulations and any interesting ones, ones that can support organisms (that is, ones that are Turing complete), can also support processes for predicting outcomes in that universe, and those processes appear to necessarily need to do reasoning about what is "simple" in some sense in order to work. So that seems to hint that algorithmic information theory isn't crazy (unless I just hand waved over a dependency loop, which I totally might have done, it's midnight), which means that we can use the equivalence of Turing complete structures to assume we can infer things about the universe. Maybe not solononoff induction, but some form of empirical induction. And then we've justified ordinary reasoning about what's simple.
Okay, so we can reason normally about simplicity. What universes produce observers like us and arise from mathematically simple rules? Lots of them, but it seems to me the main ones produce us via base physics, and then because there was an instance in base physics, we also get produced in neighboring civilizations' simulations of what other things base physics might have done in nearby galaxies so as to predict what kind of superintelligent aliens they might be negotiating with before they meet each other. Or, they produce us by base physics, and then we get instantiated again later to figure out what we did. Ancestor sims require very good outcomes which seem rare, so those branches are lower measure anyway, but also ancestor sims don't get to produce super ai separate from the original causal influence.
Point is, no, what's going on in the simulations is nearly entirely irrelevant. We're in base physics somewhere. Get your head out of the simulation clouds and choose what you do in base physics, not based on how it affects your simulators' opinion of the simulation's moral valence. Leave that sort of crazy stuff to friendly ai, you can't understand superintelligent simulators which we can't even get evidence exist besides plausible but very galaxy brain abstract arguments.
(Oh, might be relevant that I'm a halfer when making predictions, thirder when choosing actions - see anthropic decision theory for an intuition on that.)
If we have no grasp on anything outside our virtualized reality, all is lost. Therefore I discard my attempts to control those possible worlds.
However, the simulation argument relies on reasoning. To go through requires a number of assumptions hold. Those in turn rely on: why would we be simulated? It seems to me the main reason is because we're near a point of high influence in original reality and they want to know what happened - the simulations then are effectively extremely high resolution memories. Therefore, thank those simulating us for the additional units of "existence", and focus on original reality where there's influence to be had; that's why alien or our future superintelligences would care what happened.
https://arxiv.org/pdf/1110.6437
Basically, don't freak out about simulations. It's not that different from the older concept "history is watching you". Intense, but not world shatteringly intense.
willingness seems likely to be understating it. a context where the capability is even part of the author context seems like a prereq. finetuning would produce that, with fewshot one has to figure out how to make it correlate. I'll try some more ideas.
if it's a fully general argument, that's a problem I don't know how to solve at the moment. I suspect it's not, but that the space of unblocked ways to test models is small. I'm bouncing ideas about this around out loud with some folks the past day, possibly someone will show up with an idea for how to constrain on what benchmarks are worth making soonish. but the direction I see as maybe promising is, what makes a benchmark reliably suck as a bragging rights challenge?
Partially agreed. I've tested this a little personally; Claude successfully predicted their own success probability on some programming tasks, but was unable to report their own underlying token probabilities. The former tests weren't that good, the latter ones somewhat were okay, I asked Claude to say the same thing across 10 branches and then asked a separate thread of Claude, also downstream of the same context, to verbally predict the distribution.
Aa I said elsewhere, https://www.lesswrong.com/posts/LfQCzph7rc2vxpweS/introducing-the-weirdml-benchmark?commentId=q86ogStKyge9Jznpv
This is a capabilities game. It is neither alignment or safety. To the degree it's forecasting, it helps cause the thing it forecasts. This has been the standard pattern in capabilities research for a long time: someone makes a benchmark (say, imagenet 1.3m 1000class), and this produces a leaderboard that allows people to show how good their learning algorithm is at novel datasets. In some cases this even produced models directly that were generally useful, but it traditionally was used to show how well an algorithm would work in a new context from scratch. Building benchmarks like this gives teams a new way to brag - they may have a better source of training data (eg, google always had a better source of training data than imagenet), but it allows them to brag that they scored well on the benchmark, which among other things helps them get funding.
Perhaps it also helps convince people to be concerned. That might trade off against this. Perhaps it sucks in some way as a bragging rights challenge. That would trade off against this
Hopefully it sucks as a bragging rights challenge.
What will you do if nobody makes a successful case?
This is a capabilities game. It is neither alignment or safety. To the degree it's forecasting, it helps cause the thing it forecasts. This has been the standard pattern in capabilities research for a long time: someone makes a benchmark (say, imagenet 1.3m 1000class), and this produces a leaderboard that allows people to show how good their learning algorithm is at novel datasets. In some cases this even produced models directly that were generally useful, but it traditionally was used to show how well an algorithm would work in a new context from scratch. Building benchmarks like this gives teams a new way to brag - they may have a better source of training data (eg, google always had a better source of training data than imagenet), but it allows them to brag that they scored well on the benchmark, which among other things helps them get funding.
Perhaps it also helps convince people to be concerned. That might trade off against this. Perhaps it sucks in some way as a bragging rights challenge. That would trade off against this.
barring anything else you might have meant, temporarily assuming yudkowsky's level of concern if someone builds yudkowsky's monster, then evidentially speaking, it's still the case that "if we build AGI, everyone will die" is unjustified in a world where it's unclear if alignment is going to succeed before someone can build yudkowsky's monster. in other words, agreed.
A question in my head is what range of fixed points are possible in terms of different numeric ("monetary") economic mechanisms and contracts. Seems to me those are a kind of AI component that has been in use since before computers.
Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.
As Lucius said, resources in space are unprotected.
Organizations which hand more of their decision-making to sufficiently strong AIs "win" by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don't systemically reinforce towards outcomes where the things available to buy actually satisfy the preferences of remaining humans who own ai stock or land, then the owners can end up with no not-deadly food and a lot of money, while datacenters grow and grow, taking up energy and land with (semi?-)autonomously self replicating factories or the like - if money-like exchange continues to be how the physical economy is managed in ai to ai interactions, these self replicating factories might end up adapted to make products that the market will buy. but if the majority of the buying power is ai controlled corporations, then figuring out how to best manipulate those ais into buying is the priority. If it isn't, then manipulating humans into buying is the priority.
It seems to me that the economic alignment problem of guaranteeing everyone is each able to reliably only spend money on things that actually match their own preferences, so that sellers can't gain economic power by customer manipulation, is an ongoing serious problem that ends up being the weak link in scenarios where AIs manage an economy that uses similar numeric abstractions and contracts (money, ownership, rent) as the current one.
Your original sentence was better.
I'll just ask Claude to respond to everything you've said so far:
Let me extract and critique the core claims from their long response, focusing on what's testable and mechanistic:
Key Claims:
1. AI agents working together could achieve "non-linear" problem-solving capacity through shared semantic representations
2. This poses an alignment risk if AIs develop internal semantic representations humans can't interpret
3. The AI safety community's emphasis on mathematical/empirical approaches may miss important insights
4. A "decentralized collective intelligence" framework is needed to address thisCritical Issues:
1. The mechanism for "semantic backpropagation" and "non-linear scaling" is never specified mathematically. What's the actual claimed growth rate? What's the bottleneck? Without these specifics, it's impossible to evaluate.
2. The "reasoning types" discussion (System 1/2) misapplies dual process theory. The relevant question isn't about reasoning styles, but about what precise claims are being made and how we could test them.
3. No clear definition is given for "decentralized collective intelligence" - what exactly would make a system qualify? What properties must it have? How would we measure its effectiveness?
Suggested Focus:
Instead of broad claims about cognitive science and collective intelligence, the OP should:1. Write out the claimed semantic backpropagation algorithm in pseudocode
2. Specify concrete numerical predictions about scaling behavior
3. Design experiments to test these predictions
4. Identify falsifiable conditionsRight now, the writing pattern suggests someone pattern-matching to complex systems concepts without grounding them in testable mechanisms. The core ideas might be interesting, but they need to be made precise enough to evaluate.
I generally find AIs are much more helpful for critiquing ideas than for generating them. Even here, you can see Claude was pretty wordy and significantly repeated what I'd already said.
I think there's not even the slightest hint at any beyond-pure-base-physics stuff going on
in us, either
Would love to see a version of this post which does not involve ChatGPT whatsoever, only involves Claude to the degree necessary and never to choose a sequence of words that is included in the resulting text, is optimized to be specific and mathematical, and makes its points without hesitating to use LaTeX to actually get into the math. And expect the math to be scrutinized closely - I'm asking for math so that I and others here can learn from it to the degree it's valid, and pull on it to the degree it isn't. I'm interested in these topics and your post hasn't changed that interest, but it's a lot of words and I can't figure out if there's anything novel underneath the pile of marketing stuff. How would you make your entire point in 10 words? 50? 200?
Fractals are in fact related in some ways, but this sounds like marketing content, doesn't have the actual careful reasoning necessary for the insights you're near to be useable. I feel like they're pretty mundane insights anyhow - any dynamical system with a lyapunov exponent greater than 1 generates a shape with fractal dimension in its phase portrait. That sounds fancy with all those technical words, but actually it isn't saying a ton. It does say something, but a great many dynamical systems of interest have lyapunov exponent greater than 1 at least in some parameter configurations, and that isn't magic. The specific claims seem to check out somewhat to me: yup, the world and AIs in particular are a complex chaotic system. but it feels like saying fractal doesn't tell us new interesting things about that, it's just a hype phrasing. The high ratio of self cites gives me a similar feeling. Complex systems folks seem to have a tendency to get all attached to keywords, like this sentence:
Fractal intelligence integrates cognitive science, graph theory, knowledge representation, and systems thinking.
Integrates... how? Did chatgpt write that? Like, I'm being critical because I think there's something here, but the hype approach seems like it doesn't do the mundane points justice. Calling it "fractal intelligence" seems like buzzword bingo.
but I don't think your post is worthy of mass downvotes, it's hyped up marketing speak for something that has some degree of real relevance. would be interested to see how you'd distill this down to an eli15 or such.
Bit of a tangent, but topical: I don't think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it's not clear what internal phenomena are negative for the reanimated mind. The equivalence to slavery seems to me better expressed by saying they approximately reanimated mind-defining data without the consent of the minds being reanimated; the way people express this is normally to say things like "stolen data".
Due to community input, I've deleted my comment. Thanks for letting me know.
Say I'm convinced. Should I delete my post? (edit 1: I am currently predicting "yes" at something like 70%, and if so, will do so. ... edit 4: deleted it. DM if you want the previous text)
but how would we do high intensity, highly focused research on something intentionally restructured to be an "AI outcomes" research question? I don't think this is pointless - agency research might naturally talk about outcomes in a way that is general across a variety of people's concerns. In particular, ethics and alignment seem like they're an unnatural split, and outcomes seems like a refactor that could select important problems from both AI autonomy risks and human agency risks. I have more specific threads I could talk about.
perhaps. but my reasoning is something like -
better than "alignment": what's being aligned? outcomes should be (citation needed)
better than "ethics": how does one act ethically? by producing good outcomes (citation needed).
better than "notkilleveryoneism": I actually would prefer everyone dying now to everyone being tortured for a million years and then dying, for example, and I can come up with many other counterexamples - not dying is not the problem, achieving good things is the problem.
might not work for deontologists. that seems fine to me, I float somewhere between virtue ethics and utilitarianism anyway.
perhaps there are more catchy words that could be used, though. hope to see someone suggest one someday.
Do bacteria need to be VNM agents?
How about ducks?
Do ants need to be VNM agents?
How about anthills?
Do proteins need to be VNM agents?
How about leukocytes?
Do dogs need to be VNM agents?
How about trees?
Do planets (edit: specifically, populated ones) need to be VNM agents?
How about countries?
Or neighborhoods?
Or interest groups?
Or families?
Or companies?
Or unions?
Or friend groups?
Art groups?
For each of these, which of the assumptions of the VNM framework break, and why?
How do we represent preferences which are not located in a single place?
Or not fully defined at a single time?
What framework lets us natively represent a unit of partially specified preference? If macro agency arises from what Michael Levin calls "agential materials", how do we represent how the small scale selfhood aggregates?
At what scale does agency arise, how do we know, and how are preferences represented?
Pasting the above to Claude gets mildly interesting results. I'd be interested in human thoughts.
How about "AI outcomes"
The first step would probably be to avoid letting the existing field influence you too much. Instead, consider from scratch what the problems of minds and AI are, how they relate to reality and to other problems, and try to grab them with intellectual tools you're familiar with. Talk to other physicists and try to get into exploratory conversation that does not rely on existing knowledge. If you look at the existing field, look at it like you're studying aliens anthropologically.
the self referential joke thing
"mine some crypt-"
there's a contingent who would close it as soon as someone used an insult focused on intelligence, rather than on intentional behavior. to fix for that subcrowd, "idiot" becomes "fool"
those are the main ones, but then I sometimes get "tldr" responses, and even when I copy out the main civilization story section, I get "they think the authorities could be automated? that can't happen" responses, which I think would be less severe if the buildup to that showed more of them struggling to make autonomous robots work at all. Most people on the left who dislike ai think it doesn't and won't work, and any claim that it does needs to be in tune with reality about how ai currently looks, if it's going to predict that it eventually changes. the story spends a lot of time on making discovering the planet motivated and realistic, and not very much time on how they went from basic ai to replacing humans. in order for the left to accept it you'd need to make suck but kinda work, and yet get mass deployment anyway. it would need to be in touch with the real things that have happened so far.
I imagine something similar is true for pitching this to businesspeople - they'd have to be able to see how it went from the thing they enjoy now to being catastrophic, in a believable way, that doesn't feel like invoking clarketech or relying on altmanhype.
I don't think the answer is as simple as changing terminology or carefully modelling their current viewpoints and bridging the inferential divides.
Indeed, and I think that-this-is-the-case is the message I want communicators to grasp: I have very little reach, but I have significant experience talking to people like this, and I want to transfer some of the knowledge from that experience to people who can use it better.
The thing I've found most useful is to be able to express that significant parts of their viewpoint are reasonable. Eg, one thing I've tried is "AI isn't just stealing our work, it's also stealing our competence". Hasn't stuck, though. I find it helpful to point out that yes, climate change sure is a (somewhat understated) accurate description of what doom looks like.
I do think "allergies" are a good way to think about it, though. They're not unable to consider what might happen if AI keeps going as it is, they're part of a culture that is trying to apply antibodies to AI. And those antibodies include active inference wishcasting like "AI is useless". They know it's not completely useless, but the antibody requires them to not acknowledge that in order for its effect to bind; and their criticisms aren't wrong, just incomplete - the problems they raise with AI are typically real problems, but not high impact ones so much as ones they think will reduce the marketability of AI.
This is the story I use to express what a world where we fail looks like to left-leaning people who are allergic to the idea that AI could be powerful. It doesn't get the point across great, due to a number of things that continue to be fnords for left leaning folks which this story uses, but it works better than most other options. It also doesn't seem too far off what I expect to be the default failure case; though the factories being made of low-intelligence robotic operators seems unrealistic to me.
I opened it now to make this exact point.
This is talking about dem voters or generally progressive citizens, not dem politicians, correct?
people who dislike AI, and therefore could be taking risks from AI seriously, are instead having reactions like this. https://blue.mackuba.eu/skythread/?author=brooklynmarie.bsky.social&post=3lcywmwr7b22i why? if we soberly evaluate what this person has said about AI, and just, like, think about why they would say such a thing - well, what do they seem to mean? they typically say "AI is destroying the world", someone said that in the comments; but then roll their eyes at the idea that AI is powerful. They say the issue is water consumption - why would someone repeat that idea? Under what framework is that a sensible combination of things to say? what consensus are they trying to build? what about the article are they responding to?
I think there are straightforward answers to these questions that are reasonable and good on behalf of the people who say these things, but are not as effective by their own standards as they could be, and which miss upcoming concerns. I could say more about what I think, but I'd rather post this as leading questions, because I think the reading of the person's posts you'd need to do to go from the questions I just asked to my opinions will build more of the model I want to convey than saying it directly.
But I think the fact that articles like this get reactions like this is an indication that orgs like Anthropic or PauseAI are not engaging seriously with detractors, and trying seriously to do so seems to me like a good idea. It's not my top priority ask for Anthropic, but it's not very far down the virtual list.
But it's just one of many reactions of this category I've seen that seem to me to indicate that people engaging with a rationalist-type negative attitude towards their observations of AI are not communicating successfully with people who have an ordinary-person-type negative attitude towards what they've seen of AI. I suspect that at least a large part of the issue is that rationalists have built up antibodies to a certain kind of attitude and auto-ignore it, despite what I perceive to be its popularity, and as a result don't build intuitive models about how to communicate with such a person.
I suspect fixing this would need to involve creating something new which doesn't have the structural problems in EA which produced this, and would involve talking to people who are non-sensationalist EA detractors but who are involved with similarly motivated projects. I'd start here and skip past the ones that are arguing "EA good" to find the ones that are "EA bad, because [list of reasons ea principles are good, and implication that EA is bad because it fails at its stated principles]"
I suspect, even without seeking that out, the spirit of EA that made it ever partly good has already and will further metastasize into genpop.
I was someone who had shorter timelines. At this point, most of the concrete part of what I expected has happened, but the "actually AGI" thing hasn't. I'm not sure how long the tail will turn out to be. I only say this to get it on record.
https://www.drmichaellevin.org/research/
https://www.drmichaellevin.org/publications/
it's not directly on alignment, but it's relevant to understanding agent membranes. understanding his work seems useful as a strong exemplar of what one needs to describe with a formal theory of agents and such. particularly interesting is https://pubmed.ncbi.nlm.nih.gov/31920779/
It's not the result we're looking for, but it's inspiring in useful ways.
Yes to both. I don't think Cannell is correct about an implementation of what he said being a good idea, even if it was a certified implementation, and I also don't think his idea is close to ready to implement. Agent membranes still seem at all interesting, right now as far as I know the most interesting work is coming from the Levin lab (tufts university, michael levin), but I'm not happy with any of it for nailing down what we mean by aligning an arbitrarily powerful mind to care about the actual beings in its environment in a strongly durable way.
What is a concise intro that will teach me everything I need to know for understanding every expression here? I'm also asking Claude, interested in input from people with useful physics textbook taste
qaci seems to require the system having an understanding-creating property that makes it a reliable historian. have been thinking about this, have more to say, currently rather raw and unfinished.
hmm actually, I think I was the one who was wrong on that one. https://en.wikipedia.org/wiki/Synaptic_weight seems to indicate the process I remembered existing doesn't primarily work how I thought it did.