Posts
Comments
Is this related to the bounty, or a separate project?
Furthermore, most of these problems can be addressed just fine in a Bayesian framework. In Jaynes-style Bayesianism, every proposition has to be evaluated in the scope of a probabilistic model; the symbols in propositions are scoped to the model, and we can’t evaluate probabilities without the model. That model is intended to represent an agent’s world-model, which for realistic agents is a big complicated thing.
It still misses the key issue of ontological remodeling. If the world-model is inadequate for expressing a proposition, no meaningful probability could be assigned to it.
Killing oneself with high certainty of effectiveness is more difficult than most assume.
Dying naturally also isn't as smooth as plenty of people assume. I'm pretty sure that "taking things into your hands" leads to higher amount of expected suffering reduction in most cases, and it's not informed rational analysis that prevents people from taking that option.
If a future hostile agent just wants to maximize suffering, will foregoing preservation protect you from it?
Yes? I mean, unless we entertain some extreme abstractions like it simulating all possible minds of certain complexity or whatever.
This isn’t really a problem with alignment
I'd rather put it that resolving that problem is a prerequisite for the notion of "alignment problem" to be meaningful in the first place. It's not technically a contradiction to have an "aligned" superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
Because humans have incoherent preferences, and it's unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, "there’s no canonical way to scale me up".
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I'm doubtful that such an assumption holds, but most people probably aren't.
We’ll say that a state is in fact reachable if a group of humans could in principle take actions with actuators - hands, vocal chords, etc - that could realize that state.
The main issue here is that groups of humans may in principle be capable of great many things, but there's a vast chasm between "in principle" and "in practice". A superintelligence worthy of the name would likely be able to come up with plans that we wouldn't in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
I think that saying that "executable philosophy" has failed is missing Yudkowsky's main point. Quoting from the Arbital page:
To build and align Artificial Intelligence, we need to answer some complex questions about how to compute goodness
He claims that unless we learn how to translate philosophy into "ideas that we can compile and run", aligned AGI is out of the question. This is not a worldview, but an empirical proposition, the truth of which remains to be determined.
There's also an adjacent worldview, which suffuses the Sequences, that it's possible in the relatively short term to become much more generally "rational" than even the smartest uninitiated people, "faster than science" etc, and that this is chiefly rooted in Bayes, Solomonoff &Co. It's fair to conclude that this has largely failed, and IMO Chapman makes a convincing case that this failure was unavoidable. (He also annoyingly keeps hinting that there is a supremely fruitful "meta-rational" worldview instead that he's about to reveal to the world. Any day now. I'm not holding my breath.)
the philosophy department thinks you should defect in a one-shot prisoners’ dilemma
Without further qualifications, shouldn't you? There are plenty of crazy mainstream philosophical ideas, but this seems like a strange example.
Yes, I buy the general theory that he was bamboozled by misleading maps. My claim is that it's precisely the situation where a compass should've been enough to point out that something had gone wrong early enough for the situation to have been salvageable, in a way that sun clues plausibly wouldn't have.
Well, the thing I'm most interested in is the basic compass. From what I can see on the maps, he was going in the opposite direction from the main road for a long time after it should have become obvious that he had been lost. This is a truly essential thing that I've never gone into unfamiliar wilderness without.
If you go out into the wilderness, bring plenty of water. Maybe bring a friend. Carry a GPS unit or even a PLB if you might go into risky territory. Carry the 10 essentials.
Most people who die in the wilderness have done something stupid to wind up there. Fewer people die who have NOT done anything glaringly stupid, but it still happens, the same way. Ewasko’s case appears to have been one of these.
Hmm, so is there evidence that he did in fact follow those common-sense guidelines and died in spite of that? Google doesn't tell me what was found alongside his remains besides a wallet.
They don't, of course, but if you're lucky enough not to be located among the more zealous of them and be subjected to mandatory struggle sessions, their wrath will generally be pointed at more conspicuous targets. For now, at least.
We have a significant comparative advantage to pretty much all of Western philosophy.
I do agree that there are some valuable Eastern insights that haven't yet penetrated the Western mainstream, so work in this direction is worth a try.
We believe we’re in a specific moment in history where there’s more leverage than usual, and so there’s opportunity. We understand that chances are slim and dim.
Also reasonable.
We have been losing the thread to ‘what is good’ over the millenia. We don’t need to reinvent the wheel on this; the answers have been around.
Here I disagree. I think that much of "what is good" is contingent on our material circumstances, which are changing ever faster these days, so it's no surprise that old answers no longer work as well as they did in their time. Unfortunately, nobody has discovered a reliable way to timely update them yet, and very few seem to even acknowledge this problem.
I don't think that intelligence and military are likely to be much more of reckless idiots than Altman and co., what seems more probable is that their interests and attitudes genuinely align.
most modern humans are terribly confused about morality
The other option is being slightly less terribly confused, I presume.
This is why MAPLE exists, to help answer the question of what is good
Do you consider yourselves having significant comparative advantage in this area relative to all other moral philosophers throughout the millenia whose efforts weren't enough to lift humanity from the aforementioned dismal state?
Oh, sure, I agree that an ASI would understand all of that well enough, but even if it wanted to, it wouldn't be able to give us either all of what we think we want, or what we would endorse in some hypothetical enlightened way, because neither of those things comprise a coherent framework that robustly generalizes far out-of-distribution for human circumstances, even for one person, never mind the whole of humanity.
The best we could hope for is that some-true-core-of-us-or-whatever would generalize in such way, the AI recognizes this and propagates that while sacrificing inessential contradictory parts. But given that our current state of moral philosophy is hopelessly out of its depth relative to this, to the extent that people rarely even acknowledge these issues, trusting that AI would get this right seems like a desperate gamble to me, even granting that we somehow could make it want to.
Of course, it doesn't look like we would get to choose not to get subjected a gamble of this sort even if more people were aware of it, so maybe it's better for them to remain in blissful ignorance for now.
I expect this because humans seem agent-like enough that modeling them as trying to optimize for some set of goals is a computationally efficient heuristic in the toolbox for predicting humans.
Sure, but the sort of thing that people actually optimize for (revealed preferences) tends to be very different from what they proclaim to be their values. This is a point not often raised in polite conversation, but to me it's a key reason for the thing people call "value alignment" being incoherent in the first place.
But meditation is non-addictive.
Why not? An ability to get blissed-out on demand sure seems like it could be dangerous. And, relatedly, I have seen stuff mentioning jhana addicts a few times.
Indeed, from what I see there is consensus that academic standards on elite campuses are dramatically down, likely this has a lot to do with the need to sustain holistic admissions.
As in, the academic requirements, the ‘being smarter’ requirement, has actually weakened substantially. You need to be less smart, because the process does not care so much if you are smart, past a minimum. The process cares about… other things.
So, the signalling value of their degrees should be decreasing accordingly, unless one mainly intends to take advantage of the process. Has some tangible evidence of that appeared already, and are alternative signalling opportunities emerging?
I think Scott’s name is not newsworthy either.
Metz/NYT disagree. He doesn't completely spell out why (it's not his style), but, luckily, Scott himself did:
If someone thinks I am so egregious that I don’t deserve the mask of anonymity, then I guess they have to name me, the same way they name criminals and terrorists.
Metz/NYT considered Scott to be bad enough to deserve whatever inconveniences/punishments would come to him as a result of tying his alleged wrongthink to his real name, is the long and short of it.
Right, the modern civilization point is more about the "green" archetype. The "yin" thing is of course much more ancient and subtle, but even so I doubt that it (and philosophy in general) was a major consideration before the advent of agriculture leading to greater stability, especially for the higher classes.
and another to actually experience the insights from the inside in a way that shifts your unconscious predictions.
Right, so my experience around this is that I'm probably one of the lucky ones in that I've never really had those sorts of internal conflicts that make people claim that they suffer from akrasia, or excessive shame/guilt/regret. I've always been at peace with myself in this sense, and so reading people trying to explain their therapy/spirituality insights usually makes me go "Huh, so apparently this stuff doesn't come naturally to most people, shame that they have to bend themselves backwards to get to where I have always been. Cool that they have developed all these neat theoretical constructions meanwhile though."
Maybe give some of it a try if you haven’t already, see if you feel motivated to continue doing it for the immediate benefits, and then just stick to reading about it out of curiosity if not?
Trying to dismiss the content of my thoughts does seem to help me fall asleep faster (sometimes), so there's that at least :)
Thanks for such a thorough response! I have enjoyed reading your stuff over the years, from all the spirituality-positive people I find your approach especially lucid and reasonable, up there with David Chapman's.
I also agree with many of the object-level claims that you say spiritual practices helped you reach, like the multi-agent model of mind, cognitive fusion, etc. But, since I seem to be able to make sense of them without having to meditate myself, it has always left me bemused as to whether meditation really is the "royal road" to these kinds of insight, and if whatever extra it might offer is worth the effort. Like, for example, I already rate my life satisfaction at around 7, and this seems adequate given my objective circumstances.
So, I guess, my real question for the therapy and spirituality-positive people is why they think that their evidence for believing what they believe is stronger than that of other people in that field who have different models/practices/approaches but about the same amount of evidence for its effectiveness. Granted that RCTs aren't always, or even often, easy, but it seems to me that the default response to lack of strong evidence of that sort, or particularly reliable models of reality like those that justify trusting parachutes even in the absence of RCTs, is to be less sure that you have grasped the real thing. I have no reason to doubt that plenty of therapists/coaches etc. have good evidence that something that they do works, but having a good, complete explanation of what exactly works or why is orders of magnitude harder, and I don't think that anybody in the world could reasonably claim to have the complete picture, or anything close to it.
I think western psychotherapies are predicated on incorrect models of human psychology.
Yet they all seem to have positive effects of similar magnitude. This suggests that we don't understand the mechanism through which they actually work, and it seems straightforward to expect that this extends to less orthodox practices.
RCTs mostly can’t capture the effects of serious practice over a long period of time
But my understanding is that benefits of (good) spiritual practices are supposed to be continuous, if not entirely linear. However much effort you invest correlates with the amount of benefits you get, until enlightenment and becoming as gods.
Some forms of therapy, especially ones that help you notice blindspots or significantly reframe your experience or relationship to yourself or the world (e.g. parts work where you first shift to perceiving yourself as being made of parts, and then to seeing those parts with love)
What is your take on the Dodo bird verdict, in relation to both therapy and Buddhism-adjacent things? All this stuff seems to be very heavy on personal anecdotes and just-so stories, and light on RCT-type things. Maybe there's a there there, but it doesn't seem like serious systematic study of this whole field has even begun, and there's plenty of suspicious resistance to even the idea of that from certain quarters.
For whatever reason, it looks like when these kinds of delusions are removed, people gravitate towards being compassionate, loving, etc.
This is also a big if true type claim which from the outside doesn't seem remotely clear, and to the extent that it is true causation may well be reversed.
That is, for all its associations with blue (and to a lesser extent, black), rationality (according to Yudkowsky) is actually, ultimately, a projectof red. The explanatory structure is really: red (that is, your desires), therefore black (that is, realizing your desires), therefore blue (knowledge being useful for this purpose; knowledge as a form of power).
Almost. The explanation structure is: green (thou art godshatter), therefore red, therefore black, therefore blue. Yudkowsky may not have a green vibe, as you describe it in this series, but he certainly doesn't shy from acknowledging that there's no ultimate escaping from the substrate.
Green is the idea that you don’t have to strive towards anything.
Can only be said by somebody not currently starving, freezing/parched or chased by a tiger. Modern civilization has insulated us from those "green" delights so thoroughly that we have an idealized conception far removed from how things routinely are in the natural world. Self-preservation is the first thing that any living being strives towards, the greenest thing there is, any "yin" can be entertained only when that's sorted out.
But some of them don’t immediately discount the Spokesperson’s false-empiricism argument publicly
Most likely as a part of the usual arguments-as-soldiers political dynamic.
I do think that there's an actual argument to be made that we have much less empirical evidence regarding AIs compared to Ponzis, and plently of people on both sides of this debate are far too overconfident in their grand theories, EY very much included.
Sure, there is common sense, available to plenty of people, of which reference classes apply to Ponzi schemes (but, somehow, not to everybody, far from it). Yudkowsky's point, however, is that the issue of future AIs is entirely analogous, so people who disagree with him on this are as dumb as those taken in by Bernies and Bankmans. Which just seems empirically false - I'm sure that the proportion of AI doom skeptics among ML experts is much higher than that that of Ponzi believers among professional economists. So, if there is progress to be made here, it probably lies in grappling with whatever asymmetries are between these situations. Telling skeptics a hundredth time that they're just dumb doesn't look promising.
And due to obvious selection effects, such people are most likely to end up in need of one. Must be a delightful job...
The standard excuse is that the possibility to ruin everything was a necessary cost of our freedom, which doesn’t make much sense
There's one further objection to this, to which I've never seen a theist responding.
Suppose it's true that freedom is important enough to justify the existence of evil. What's up with heaven then? Either there's no evil there and therefore no freedom (which is still somehow fine, but if so, why the non-heaven rigmarole then?), or both are there and the whole concept is incoherent.
That's probably Kevin's touch. Robin has this almost inhuman detachment, which on the one hand allows him to see things most others don't, but on the other makes communicating them hard, whereas Kevin managed to translate those insights into engaging humanese.
Any prospective "rationality" training has to comprehensively grapple with the issues raised there, and as far as I can tell, they don't usually take center stage in the publicized agendas.
What do people here think about Robin Hanson's view, for example as elaborated by him and Kevin Simler in the book Elephant in the Brain? I've seen surprisingly few mentions/discussions of this over the years in the LW-adjacent sphere, despite Hanson being an important forerunner of the modern rationalist movement.
One of his main theses, that humans are strategic self-deceivers, seems particularly important (in the "big if true" way), yet downplayed/obscure.
To me, the main deficiency is that it doesn't make the possibility, indeed, the eventual inevitability of ontological remodeling explicit. The map is a definite concept, everybody knows what maps look like, that you can always compare them etc. But you can't readily compare Newtonian and quantum mechanics, they mostly aren't even speaking about the same things.
Well, I blame Yudkowsky for the terminology issue, he took a term with hundreds of years of history and used it mostly in place of another established term which was traditionally sort of in opposition to the former one, no less (rationalism vs empiricism).
As I understand it, Chapman's main target audience wasn't LW, but normal STEM-educated people unsophisticated in the philosophy of science-related issues. Pretty much what Yudkowsky called "traditional rationality".
The map/territory essay: https://metarationality.com/maps-and-territory
Here's Chapman's characterization of LW:
Assuming by “the modern rationality movement” you mean the LessWrong-adjacent subculture, some of what they write is unambiguously meta-rational. The center of gravity is more-or-less rationalism as I use the term, but the subculture is not exclusively that.
Among the (arguably) core LW beliefs that he has criticized over the years are Bayesianism as a complete approach to epistemology, utilitarianism as a workable approach to ethics, the map/territory metaphor as a particularly apt way to think about the relationship between belief and reality.
Well, so far no such higher power seems forthcoming, and totalizing ideologies grip public imagination as surely as ever, so the need for liberalism-or-something-better is still live, for those not especially into wars.
Of course liberalism has struggles, the whole point of it is that it's the best currently known way to deal with competing interests and value differences short of war. This invites three possible categories of objection: that there is actually a better way, that there is no better way and liberalism also no longer works, or that wars are actually a desirable method of conflict resolution. From what I can tell, yours seem to fall into the second and/or third category, but I'm interested in whether you have anything in the first one.
I don't see a substantial difference between a (good enough) experience machine and an 'aligned' superintelligent Bostromian singleton, so the apparent opposition to the former combined with the enthusiastic support for the latter from the archetypal transhumanist always confused me.
That is, turns itself into a God, while also keeping its heart intact? Well, you can do that too (right?).
Likely wrong. Human heart is a loose amalgamation of heuristics adapted to deal with its immediate surroundings, and couldn't survive ascension to godhood intact. As usual, Scott put it best (the Bay Area transit system analogy), but unfortunately stuck it in the end of a mostly-unrelated post, so it's undeservedly obscure.
David Chapman has been banging on for years now against "Bayesianism"/early LW-style rationality being particularly useful for novel scientific advances, and, separately, against utilitarianism being a satisfactory all-purpose system of ethics. He proposes another "royal road", something something Kegan stage 5 (and maybe also Buddhism for some reason), but, frustratingly, his writings so far are rich on expositions and problem statements but consist of many IOUs on detailed solution approaches. I think that he makes a compelling case that these are open problems, insufficiently acknowledged and grappled with even by non-mainstream communities like the LW-sphere, but is probably overconfident about postmodernism/himself having much useful to offer in the way of answers.
I'd say that, on conflict theory terms, NYT adequately described Scott. They correctly identified him as a contrarian willing to entertain, and maybe even hold, taboo opinions, and to have polite interactions with out-and-out witches. Of course, we may think it deplorable that the 'newspaper of record' considers such people deserving to be publicly named and shamed, but they provided reasonably accurate information to those sharing this point of view.
Maybe I’m missing some context, but wouldn’t it be better for Open AI as an organized entity to be destroyed than for it to exist right up to the point where all humans are destroyed by an AGI that is neither benevolent nor “aligned with humanity” (if we are somehow so objectively bad as to not deserve care by a benevolent powerful and very smart entity).
This seems to presuppose that there is a strong causal effect from OpenAI's destruction to avoiding creation of an omnicidal AGI, which doesn't seem likely? The real question is whether OpenAI was, on the margin, a worse front-runner than its closest competitors, which is plausible, but then the board should have made that case loudly and clearly, because, entirely predictably, their silence has just made the situation worse.
To me the core reason for wide disagreement seems simple enough - at this stage the essential nature of AI existential risk arguments is not scientific but philosophical. The terms are informal and there are no grounded models of underlying dynamics (in contrast with e.g. climate change). Large persistent philosophical disagreements are very much the widespread norm, and thus unsurprising in this particular instance as well, even among experts in currently existing AIs, as it's far from clear how their insights would extrapolate to hypothetical future systems.