Posts

(USA) N95 masks are available on Amazon 2021-01-18T10:37:40.296Z
Anti-EMH Evidence (and a plea for help) 2020-12-05T18:29:31.772Z
A tale from Communist China 2020-10-18T17:37:42.228Z
Everything I Know About Elite America I Learned From ‘Fresh Prince’ and ‘West Wing’ 2020-10-11T18:07:52.623Z
Tips/tricks/notes on optimizing investments 2020-05-06T23:21:53.153Z
Have epistemic conditions always been this bad? 2020-01-25T04:42:52.190Z
Against Premature Abstraction of Political Issues 2019-12-18T20:19:53.909Z
What determines the balance between intelligence signaling and virtue signaling? 2019-12-09T00:11:37.662Z
Ways that China is surpassing the US 2019-11-04T09:45:53.881Z
List of resolved confusions about IDA 2019-09-30T20:03:10.506Z
Don't depend on others to ask for explanations 2019-09-18T19:12:56.145Z
Counterfactual Oracles = online supervised learning with random selection of training episodes 2019-09-10T08:29:08.143Z
AI Safety "Success Stories" 2019-09-07T02:54:15.003Z
Six AI Risk/Strategy Ideas 2019-08-27T00:40:38.672Z
Problems in AI Alignment that philosophers could potentially contribute to 2019-08-17T17:38:31.757Z
Forum participation as a research strategy 2019-07-30T18:09:48.524Z
On the purposes of decision theory research 2019-07-25T07:18:06.552Z
AGI will drastically increase economies of scale 2019-06-07T23:17:38.694Z
How to find a lost phone with dead battery, using Google Location History Takeout 2019-05-30T04:56:28.666Z
Where are people thinking and talking about global coordination for AI safety? 2019-05-22T06:24:02.425Z
"UDT2" and "against UD+ASSA" 2019-05-12T04:18:37.158Z
Disincentives for participating on LW/AF 2019-05-10T19:46:36.010Z
Strategic implications of AIs' ability to coordinate at low cost, for example by merging 2019-04-25T05:08:21.736Z
Please use real names, especially for Alignment Forum? 2019-03-29T02:54:20.812Z
The Main Sources of AI Risk? 2019-03-21T18:28:33.068Z
What's wrong with these analogies for understanding Informed Oversight and IDA? 2019-03-20T09:11:33.613Z
Three ways that "Sufficiently optimized agents appear coherent" can be false 2019-03-05T21:52:35.462Z
Why didn't Agoric Computing become popular? 2019-02-16T06:19:56.121Z
Some disjunctive reasons for urgency on AI risk 2019-02-15T20:43:17.340Z
Some Thoughts on Metaphilosophy 2019-02-10T00:28:29.482Z
The Argument from Philosophical Difficulty 2019-02-10T00:28:07.472Z
Why is so much discussion happening in private Google Docs? 2019-01-12T02:19:19.332Z
Two More Decision Theory Problems for Humans 2019-01-04T09:00:33.436Z
Two Neglected Problems in Human-AI Safety 2018-12-16T22:13:29.196Z
Three AI Safety Related Ideas 2018-12-13T21:32:25.415Z
Counterintuitive Comparative Advantage 2018-11-28T20:33:30.023Z
A general model of safety-oriented AI development 2018-06-11T21:00:02.670Z
Beyond Astronomical Waste 2018-06-07T21:04:44.630Z
Can corrigibility be learned safely? 2018-04-01T23:07:46.625Z
Multiplicity of "enlightenment" states and contemplative practices 2018-03-12T08:15:48.709Z
Online discussion is better than pre-publication peer review 2017-09-05T13:25:15.331Z
Examples of Superintelligence Risk (by Jeff Kaufman) 2017-07-15T16:03:58.336Z
Combining Prediction Technologies to Help Moderate Discussions 2016-12-08T00:19:35.854Z
[link] Baidu cheats in an AI contest in order to gain a 0.24% advantage 2015-06-06T06:39:44.990Z
Is the potential astronomical waste in our universe too small to care about? 2014-10-21T08:44:12.897Z
What is the difference between rationality and intelligence? 2014-08-13T11:19:53.062Z
Six Plausible Meta-Ethical Alternatives 2014-08-06T00:04:14.485Z
Look for the Next Tech Gold Rush? 2014-07-19T10:08:53.127Z
Outside View(s) and MIRI's FAI Endgame 2013-08-28T23:27:23.372Z
Three Approaches to "Friendliness" 2013-07-17T07:46:07.504Z

Comments

Comment by Wei_Dai on I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead · 2021-09-18T01:16:33.464Z · LW · GW

PSA: If you leave too much writings publicly visible on the Internet, random people in the future will be able to instantiate simulations of you, for benign or nefarious purposes. It's already too late for some of us (nobody warned us about this even though it should have been foreseeable many years ago) but the rest of you can now make a more informed choice.

(Perhaps I never commented on this post IRL, and am now experiencing what I'm experiencing because someone asked their AI, "I wonder how Wei Dai would have replied to this post.")

ETA: Maybe the simulation will continue indefinitely if I keep thinking about making changes to this comment...

Comment by Wei_Dai on Transitive Tolerance Means Intolerance · 2021-08-15T21:03:41.796Z · LW · GW

I think this is a good way to think about the issues. My main concerns, put into these terms, are

  1. The network could fall into some super-stable moral phase that's wrong or far from best. The stability could be enabled by upcoming tech like AI-enabled value lock-in, persuasion, surveillance.
  2. People will get other powers, like being able to create an astronomical number of minds, while the network is still far from the phase that it will eventually settle down to, and use those powers to do things that will turn out to be atrocities when viewed from the right moral philosophy or according to people's real values.
  3. The random effects overwhelm the directional ones and the network keeps transitioning through various phases far from the best one. (I think this is a less likely outcome though, because it seems like sooner or later it will hit upon one of the super-stable phases mentioned in 1.)

Have you written more about "moral phase transitions" somewhere, or have specific thoughts about these concerns?

Comment by Wei_Dai on Transitive Tolerance Means Intolerance · 2021-08-15T19:12:29.981Z · LW · GW

That's a good point, but aside from not having the luxury of a long arc, I'm also worried about asymmetric weapons coming online soon that will work in favor of bad ideas instead of good ones, namely AI assisted persuasion and value lock-in. Basically, good ideas should keep their hosts uncertain and probably unwilling to lock in their own values and beliefs or use superintelligent AI to essentially hack other people's minds, but people under the influence of bad ideas probably won't have such compunctions.

ETA: Also, some of the existing weapons are already asymmetric in favor of bad ideas. Namely the more moral certainty you have, the more you're willing to use social pressure / physical coercion to spread your views. This could partly explain why moral uncertainty is so rare.

Comment by Wei_Dai on Transitive Tolerance Means Intolerance · 2021-08-15T04:16:52.201Z · LW · GW

As an example of the reasoning of moral vanguards, a few days ago I became curious how the Age of Enlightenment (BTW, did those people know how to market themselves or what?) came about. How did the Enlightenment philosophers conclude (and convince others) that values like individualism, freedom, and equality would be good, given what they knew at the time? Well, judge for yourself. From https://plato.stanford.edu/entries/enlightenment:

However, John Locke’s Second Treatise of Government (1690) is the classical source of modern liberal political theory. In his First Treatise of Government, Locke attacks Robert Filmer’s Patriarcha (1680), which epitomizes the sort of political theory the Enlightenment opposes. Filmer defends the right of kings to exercise absolute authority over their subjects on the basis of the claim that they inherit the authority God vested in Adam at creation. Though Locke’s assertion of the natural freedom and equality of human beings in the Second Treatise is starkly and explicitly opposed to Filmer’s view, it is striking that the cosmology underlying Locke’s assertions is closer to Filmer’s than to Spinoza’s. According to Locke, in order to understand the nature and source of legitimate political authority, we have to understand our relations in the state of nature. Drawing upon the natural law tradition, Locke argues that it is evident to our natural reason that we are all absolutely subject to our Lord and Creator, but that, in relation to each other, we exist naturally in a state of equality “wherein all the power and jurisdiction is reciprocal, no one having more than another” (Second Treatise, §4). We also exist naturally in a condition of freedom, insofar as we may do with ourselves and our possessions as we please, within the constraints of the fundamental law of nature. The law of nature “teaches all mankind … that, being all equal and independent, no one ought to harm another in his life, health, liberty, or possessions” (§6). That we are governed in our natural condition by such a substantive moral law, legislated by God and known to us through our natural reason, implies that the state of nature is not Hobbes’ war of all against all. However, since there is lacking any human authority over all to judge of disputes and enforce the law, it is a condition marred by “inconveniencies”, in which possession of natural freedom, equality and possessions is insecure. According to Locke, we rationally quit this natural condition by contracting together to set over ourselves a political authority, charged with promulgating and enforcing a single, clear set of laws, for the sake of guaranteeing our natural rights, liberties and possessions. The civil, political law, founded ultimately upon the consent of the governed, does not cancel the natural law, according to Locke, but merely serves to draw that law closer. “[T]he law of nature stands as an eternal rule to all men” (§135). Consequently, when established political power violates that law, the people are justified in overthrowing it. Locke’s argument for the right to revolt against a government that opposes the purposes for which legitimate government is taken by some to justify the political revolution in the context of which he writes (the English revolution) and, almost a hundred years later, by others to justify the American revolution as well.

Comment by Wei_Dai on Transitive Tolerance Means Intolerance · 2021-08-14T21:59:05.146Z · LW · GW

What scares me is the realization that moral change mostly doesn't happen via "deliberation" or "reflection" but instead through this kind of tolerance/intolerance, social pressure, implicit/explicit threats, physical coercion, up to war. I guess the way it works is that some small vanguard gets convinced of a new morality through "reason" (in quotes because the reasoning that convinces them is often quite terrible, and I think they're also often motivated by implicit considerations of the benefits of being a moral vanguard), and by being more coordinated than their (initially more numerous) opponents, they can apply pressure/coercion to change some people's minds (their minds respond to the pressure by becoming true believers) and silence others or force them to mouth the party line. The end game is to indoctrinate everyone's kids with little resistance, and the old morality eventually dies off.

It seems to me like liberalism shifted the dynamics towards the softer side (withholding of association/cooperation as opposed to physical coercion/war, tolerance/intolerance instead of hard censorship) but the overall dynamics really isn't that different as far as reason/deliberation/reflection playing only a minor role in how moral change happens. In other words, life under liberalism is more pleasant in the short run, but it doesn't really do much to ensure long term moral progress, which I think explains why we're seeing a lot of apparent regress recently.

ETA: Also, to the extent that longtermists and people like me (who think that it's normative to have high moral uncertainty) are not willing to spread our views through these methods, it probably means our views will stay unpopular for a long time.

Comment by Wei_Dai on [Letter] Imperialism in the Rationalist Community · 2021-06-25T04:14:56.578Z · LW · GW

I strongly downvoted this post because I want to discourage this type of post on LW (at least for now), as it's currently impossible to discuss these issues honestly in public (from certain perspectives) without incurring unacceptable levels of political and PR risk, both for individual commenters and LW / the rationality community as a whole. (I strongly upvoted one of your other posts to even out your karma.) I wish the LW team would prioritize thinking about how to enable such discussions to happen more safely on LW, but until that's possible, I'd rather not see LW host discussions where only some perspectives can be represented.

(If you disagree with this, perhaps one compromise could be to post this kind of content on another forum or your personal blog, link to it from LW, and encourage people to only comment on the original post, to put more distance between the two and reduce risks.)

Comment by Wei_Dai on Decoupling deliberation from competition · 2021-06-25T03:37:34.689Z · LW · GW

Current human deliberation and discourse are strongly tied up with a kind of resource gathering and competition, and because of this I don't have a good picture of how things will look after the two are decoupled, nor know how to extrapolate past performance (how well human deliberation worked in the past and present) into this future.

Currently, people's thinking and speech are in large part ultimately motivated by the need to signal intelligence, loyalty, wealth, or other "positive" attributes, which help to increase one's social status and career prospects, and attract allies and mates, which are of course hugely important forms of resources, and some of the main objects of competition among humans.

Once we offload competition to AI assistants, what happens to this motivation behind discourse and deliberation, and how will that affect discourse and deliberation itself? Can you say more about what you envision happening in your scenario, in this respect?

Comment by Wei_Dai on Some blindspots in rationality and effective altruism · 2021-06-12T17:21:39.454Z · LW · GW

I recall reading somewhere about early LessWrong authors reinventing concepts that were already worked out before in philosophic disciplines (particularly in decision theory?). Can't find any post on this though.

See Eliezer’s Sequences and Mainstream Academia and scroll down for my comment there. Also https://www.greaterwrong.com/posts/XkNXsi6bsxxFaL5FL/ai-cooperation-is-already-studied-in-academia-as-program (According to these sources, AFAWK, at least some of the decision theory ideas developed on LW were not worked out already in academia.)

Comment by Wei_Dai on Decoupling deliberation from competition · 2021-05-31T19:08:43.527Z · LW · GW

As another symptom what's happening (the rest of this comment is in a "paste" that will expire in about a month, to reduce the risk of it being used against me in the future)

Comment by Wei_Dai on Some Thoughts on Metaphilosophy · 2021-05-29T18:21:02.004Z · LW · GW

having AIs derive their terminal goals from simulated humans who live in a safe virtual environment.

There has been some subsequent discussion (expressing concern/doubt) about this at https://www.lesswrong.com/posts/7jSvfeyh8ogu8GcE6/decoupling-deliberation-from-competition?commentId=bSNhJ89XFJxwBoe5e

Comment by Wei_Dai on Decoupling deliberation from competition · 2021-05-29T17:35:44.373Z · LW · GW

Here's an idea of how random drift of epistemic norms and practices can occur. Beliefs (including beliefs about normative epistemology) function in part as a signaling device, similar to clothes. (I forgot where I came across this idea originally, but a search produced a Robin Hanson article about it.) The social dynamics around this kind of signaling produces random drift in epistemic norms and practices, similar to random drift in fashion / clothing styles. Such drift coupled with certain kinds of competition could have produced the world we have today (i.e., certain groups happened upon especially effective norms/practices by chance and then spread their influence through competition), but may lead to disaster in the future in the absence of competition, as it's unclear what will then counteract future drift that will cause continued deterioration in epistemic conditions.

Another mechanism for random drift is technological change that disrupts previous epistemic norms/practices without anyone specifically intending to. I think we've seen this recently too, in the form of, e.g., cable news and social media. It seems like you're envisioning that future humans will deliberately isolate their deliberation from technological advances (until they're ready to incorporate those advances into how they deliberate), so in that scenario perhaps this form of drift will stop at some point, but (1) it's unclear how many people will actually decide to do that, and (2) even in that scenario there will still be a large amount of drift between the recent past (when epistemic conditions still seemed reasonably ok, although I had my doubts even back then), which (together with other forms of drift) might never be recovered from.

Comment by Wei_Dai on Decoupling deliberation from competition · 2021-05-27T05:04:20.961Z · LW · GW

We’ve talked about this a few times but I still don’t really feel like there’s much empirical support for the kind of permanent backsliding you’re concerned about being widespread.

I'm not claiming direct empirical support for permanent backsliding. That seems hard to come by, given that we can't see into the far future. I am observing quite severe current backsliding. For example, explicit ad hominem attacks, as well as implicitly weighing people's ideas/arguments/evidence differently, based on things like the speaker's race and sex, have become the norm in local policy discussions around these parts. AFAICT, this originated from academia, under "standpoint epistemology" and related ideas.

On the other side of the political spectrum, several people close to me became very sure that "the election was stolen" due to things like hacked Dominion machines and that the military and/or Supreme Court was going to intervene in favor of Trump (to the extent that it was impossible for me to talk them out of these conclusions). One of them, who I had previously thought was smart/sane enough to entrust a great deal of my financial resources with, recently expressed concern for my life because I was going to get the COVID vaccine.

Is this an update for you, or have you already observed such things yourself or otherwise known how bad things have become?

There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I’d guess 10% from “easy” failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from “hard” failures (most of which I think would not be addressed by competition).

Given these numbers, it seems that you're pretty sure that almost everyone will eventually "snap out of" any bad ideas they get talked into, or they talk themselves into. Why? Is this based on some observations you've made that I haven't seen, or history that you know about that I don't? Or do you have some idea of a mechanism by which this "snapping out of" happens?

Comment by Wei_Dai on Decoupling deliberation from competition · 2021-05-25T23:02:47.625Z · LW · GW

I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.

I've been thinking a lot about this lately, so I'm glad to see that it's on your mind too, although I think I may still be a bit more concerned about it than you are. Couple of thoughts:

  1. What if our "deliberation" only made it as far as it did because of "competition", and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don't know how to prevent this.

  2. What's your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever "snap out of it", or getting "persuaded" by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it's very large, easily >50%. I'm curious how others would answer this question as well.

Alice and Bob can try to have an agreement to avoid racing ahead or engaging in some kinds of manipulation, and analogous a broader society could adopt such norms or divide into communities with internal agreements of this form.

In a sane civilization, tons of people would already be studying how to make and enforce such agreements, e.g., how to define what kinds of behaviors count as "manipulation", and more generally what are good epistemic norms/practices and how to ensure that many people adopt such norms/practices. If this problem is solved, then maybe we don't need to solve metaphilosophy (in the technical or algorithmic sense), as far as preventing astronomical waste arising from bad deliberation. Unfortunately it seems there's approximately zero people working on either problem.

Comment by Wei_Dai on April 15, 2040 · 2021-05-05T01:57:50.439Z · LW · GW

In order to sign the agreement, I must make a commitment to never break it, not even if you order me to.

This illustrates something I wrote about, namely that corrigibility seems incompatible with AI-powered cooperation. (Even if an AI starts off corrigible, it has to remove that property to make agreements like this.) Curious if you have any thoughts on this. Is there some way around the seeming incompatibility? Do you think we will give up corrigibility for greater cooperation, like in this story, and if so do you think that will be fine from a safety perspective?

Comment by Wei_Dai on Another (outer) alignment failure story · 2021-04-26T03:38:02.197Z · LW · GW

This is fuzzier if you can’t tell the difference between deliberation and manipulation. If I define idealized deliberation as an individual activity then I can talk about the extent to which M leads to deviation from idealized deliberation, but it’s probably more accurate to think of idealized deliberation as a collective activity.

How will your AI compute "the extent to which M leads to deviation from idealized deliberation"? (I'm particularly confused because this seems pretty close to what I guessed earlier and seems to face similar problems, but you said that's not the kind of approach you're imagining.)

If your attack involves convincing me of a false claim, or making a statement from which I will predictably make a false inference, then the ideal remedy would be explaining the possible error; if your attack involves threatening me, then an ideal remedy would be to help me implement my preferred policy with respect to threats. And so on.

The attack I have in mind is to imitate a normal human conversation about philosophy or about what's normative (what one should do), but AI-optimized with a goal of convincing you to adopt a particular conclusion. This may well involve convincing you of a false claim, but of a philosophical nature such that you and your AI can't detect the error (unless you've solved the problem of metaphilosophy and knows what kinds of reasoning reliably leads to true and false conclusions about philosophical problems).

Comment by Wei_Dai on Another (outer) alignment failure story · 2021-04-25T19:08:43.241Z · LW · GW

Trying to imagine myself how an automated filter might work, here's a possible "solution" I came up with. Perhaps your AI maintains a model / probability distribution of things that an uncompromised Wei might naturally say, and flags anything outside or on the fringes of that distribution as potential evidence that I've been compromised by an AI-powered attack and is now trying to attack you. (I'm talking in binary terms of "compromised" and "uncompromised" for simplicity but of course it will be more complicated than that in reality.)

Is this close to what you're thinking? (If not, apologies for going off on a tangent.) If so, given that I would "naturally" change my mind over time (i.e., based on my own thinking or talking with other uncompromised humans), it seems that your AI has to model that as well. I can imagine that in such a scenario, if I ever changed my mind in an unexpected (by the AI model) direction and wanted to talk to you about that, my own AI might say something like "If you say this to Paul, his AI will become more suspicious that you've been compromised by an AI-powered attack and your risk of getting blocked now or in the future increases by Y. Are you sure you still want to say this to Paul?" So at this point, collective human philosophical/moral progress would be driven more by what AI filters expect and let pass, than by what physical human brains actually compute, so we better get those models really right, but that faces seemingly difficult problems I mentioned at Replicate the trajectory with ML? and it doesn't seem like anyone is working on such problems.

If we fail to get such models good enough early on, that could lock in failure as it becomes impossible to meaningfully collaborate with other humans (or human-AI systems) to try to improve such models, as you can't distinguish whether they're genuinely trying to make better models with you, or just trying to change your models as part of an attack.

Comment by Wei_Dai on Another (outer) alignment failure story · 2021-04-25T07:57:43.935Z · LW · GW

Most of the time when I look at a message, a bunch of automated systems have looked at it first and will inform me about the intended effect of the message in order to respond to appropriately or decide whether to read it.

This seems like the most important part so I'll just focus on this for now. I'm having trouble seeing how this can work. Suppose that I, as an attacker, tell my AI assistant, "interact with Paul in my name (possibly over a very long period of time) so as to maximize the chances that Paul eventually ends up believing in religion/ideology/moral theory X and then start spreading X to his friends" (while implicitly minimizing the chances of these messages/interactions being flagged by your automated systems as adversarial). How would your automation distinguish between me doing this, versus me trying to have a normal human conversation with you about various topics, including what's moral/normative? Or if the automation isn't trying to directly make this judgment, what is it telling you to allow you to make this judgment? Can you give a concrete example of a sentence that it might say to you, upon seeing some element of the series of messages/interactions?

Comment by Wei_Dai on gwern's Shortform · 2021-04-25T00:43:07.008Z · LW · GW

Doing another search, it seems I made at least one comment that is somewhat relevant, although it might not be what you're thinking of: https://www.greaterwrong.com/posts/5bd75cc58225bf06703751b2/in-memoryless-cartesian-environments-every-udt-policy-is-a-cdt-sia-policy/comment/kuY5LagQKgnuPTPYZ

Comment by Wei_Dai on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2021-04-24T21:09:13.230Z · LW · GW

Rob, any updates on this, e.g., has a longer reply been published somewhere since you wrote this comment, or are you still hoping "we have time to write up more thoughts on this before too long"?

Comment by Wei_Dai on Another (outer) alignment failure story · 2021-04-18T14:56:19.048Z · LW · GW

(Apologies for the late reply. I've been generally distracted by trying to take advantage of perhaps fleeting opportunities in the equities markets, and occasionally by my own mistakes while trying to do that.)

It seems like the AI described in this story is still aligned enough to defend against AI-powered persuasion (i.e. by the time that AI is sophisticated enough to cause that kind of trouble, most people are not ever coming into contact with adversarial content)

How are people going to avoid contact with adversarial content, aside from "go into an info bubble with trusted AIs and humans and block off any communications from the outside"? (If that is happening a lot, it seems worthwhile say so explicitly in the story since that might be surprising/unexpected to a lot of readers?)

I think they do, but it’s not clear whether any of them change the main dynamic described in the post.

Ok, in that case I think it would be useful to say a few words in the OP about why in this story, they don't have the desired effect, like, what happened when the safety researchers tried this?

I’d like to have a human society that is free to grow up in a way that looks good to humans, and which retains enough control to do whatever they decide is right down the line (while remaining safe and gradually expanding the resources available to them for continued growth). When push comes to shove I expect most people to strongly prefer that kind of hope (vs one that builds a kind of AI that will reach the right conclusions about everything), not on the basis of sophisticated explicit reasoning but because that’s the only path that can really grow out of the current trajectory in a way that’s not super locally super objectionable to lots of people, and so I’m focusing on people’s attempts and failures to construct such an AI.

I can empathize with this motivation, but argue that "a kind of AI that will reach the right conclusions about everything" isn't necessarily incompatible with "humans retain enough control to do whatever they decide is right down the line" since such an AI could allow humans to retain control (and merely act as an assistant/advisor, for example) instead of forcibly imposing its decisions on everyone.

I don’t know exactly what kind of failure you are imagining is locked in, that pre-empts or avoids the kind of failure described here.

For example, all or most humans lose their abilities for doing philosophical reasoning that will eventually converge to philosophical truths, because they go crazy from AI-powered memetic warfare, or come under undue influence of AI advisors who lack such abilities themselves but are extremely convincing. Or humans lock in what they currently think are their values/philosophies in some form (e.g., as utility functions in AI, or asking their AIs to help protect the humans themselves from value drift while unable to effectively differentiate between "drift" and "philosophical progress") to try to protect them from a highly volatile and unpredictable world.

Comment by Wei_Dai on A Master-Slave Model of Human Preferences · 2021-04-18T14:20:50.139Z · LW · GW

For example, the number theorist might one day have a sudden revelation that abstract mathematics is a waste of time and it should go into politics and philanthropy instead, all the while having no idea that the master is manipulating it to maximize status and power.

This isn't meant as a retraction or repudiation of anything I've written in the OP, but I just want to say that subjectively, I now have a lot more empathy with people who largely gave up their former interests in favor of political or social causes in their latter years. (I had Bertrand Russell in mind when I wrote this part.)

Comment by Wei_Dai on Another (outer) alignment failure story · 2021-04-11T17:00:58.563Z · LW · GW

The ending of the story feels implausible to me, because there's a lack of explanation of why the story doesn't side-track onto some other seemingly more likely failure mode first. (Now that I've re-read the last part of your post, it seems like you've had similar thoughts already, but I'll write mine down anyway. Also it occurs to me that perhaps I'm not the target audience of the story.) For example:

  1. In this story, what is preventing humans from going collectively insane due to nations, political factions, or even individuals blasting AI-powered persuasion/propaganda at each other? (Maybe this is what you meant by "people yelling at each other"?)

  2. Why don't AI safety researchers try to leverage AI to improve AI alignment, for example implementing DEBATE and using that to further improve alignment, or just an adhoc informal version where you ask various AI advisors to come up with improved alignment schemes and to critique/defend each others' ideas? (My expectation is that we end up with one or multiple sequences of "improved" alignment schemes that eventually lock in wrong solutions to some philosophical or metaphilosophical problems, or has some other problem that is much subtler than the kind of outer alignment failure described here.)

Comment by Wei_Dai on My research methodology · 2021-03-26T02:24:15.088Z · LW · GW

Why did you write "This post [Inaccessible Information] doesn't reflect me becoming more pessimistic about iterated amplification or alignment overall." just one month before publishing "Learning the prior"? (Is it because you were classifying "learning the prior" / imitative generalization under "iterated amplification" and now you consider it a different algorithm?)

For example, at the beginning of modern cryptography you could describe the methodology as “Tell a story about how someone learns something about your secret” and that only gradually crystallized into definitions like semantic security (and still people sometimes retreat to this informal process in order to define and clarify new security notions).

Why doesn't the analogy with cryptography make you a lot more pessimistic about AI alignment, as it did for me?

The best case is that we end up with a precise algorithm for which we still can’t tell any failure story. In that case we should implement it (in some sense this is just the final step of making it precise) and see how it works in practice.

Would you do anything else to make sure it's safe, before letting it become potentially superintelligent? For example would you want to see "alignment proofs" similar to "security proofs" in cryptography? What if such things do not seem feasible or you can't reach very high confidence that the definitions/assumptions/proofs are correct?

Comment by Wei_Dai on (USA) N95 masks are available on Amazon · 2021-02-17T07:29:59.823Z · LW · GW

You seem pretty knowledgeable in this area. Any thoughts on the mask that is linked to in my post, the Kimberly-Clark N95 Pouch Respirator? (I noticed that it's being sold by Amazon at 1/3 the price of the least expensive N95 mask on your site.)

Comment by Wei_Dai on Chinese History · 2021-02-15T10:39:49.457Z · LW · GW

Can you try to motivate the study of Chinese history a bit more? (For example, I told my grandparents' stories in part because they seem to offer useful lessons for today's world.) To me, the fact that 6 out of the 10 most deadly wars were Chinese civil wars alone does not seem to constitute strong evidence that systematically studying Chinese history is a highly valuable use of one's time. It could just mean that China had a large population and/or had a long history and/or its form of government was prone to civil wars. The main question I have is whether its history offers any useful lessons or models that someone isn't likely to have already learned from studying other human history.

Comment by Wei_Dai on (USA) N95 masks are available on Amazon · 2021-01-18T17:34:10.986Z · LW · GW

You could try medical tape and see if you can seal the mask with it, without shaving your beard.

Comment by Wei_Dai on Tips/tricks/notes on optimizing investments · 2021-01-18T05:57:14.969Z · LW · GW

When investing in individual stocks, check its borrow rate for short selling. If it's higher than say 0.5%, that means short sellers are willing to pay a significant amount to borrow the stock in order to short it, so you might want to think twice about buying the stock in case they know something you don't. If you still want to invest in it, consider using a broker that has a fully paid lending program to capture part of the borrow fees from short sellers, or writing in-the-money puts on the stock instead of buying the common shares. (I believe the latter tends to net you more of the borrow fees, in the form of extra extrinsic value on the puts.)

Comment by Wei_Dai on Anti-EMH Evidence (and a plea for help) · 2020-12-08T01:46:15.982Z · LW · GW

In addition to jmh's explanation, see covered call. Also, normally when you do a "buy-write" transaction (see above article), you're taking the risk that the stock falls by more than the premium of the call option, but in this case, if that were to happen, I can recover any losses by holding the stock until redemption. And to clarify, because I sold call options that expired in November without being exercised, I'm still able to capture any subsequent gains.

Comment by Wei_Dai on Anti-EMH Evidence (and a plea for help) · 2020-12-08T01:19:39.082Z · LW · GW
  • I'm now selling at-the-money call options against my remaining SPAC shares, instead of liquidating them, in part to capture more upside and in part to avoid realizing more capital gains this year.
  • Once the merger happens (or rather 2 days before the meeting to approve the merger, because that's the redemption deadline), there is no longer a $10 floor.
  • Writing naked call options on SPACs is dangerous because too many people do that when they try to arbitrage between SPAC options and warrants, causing the call options to have negative extrinsic value, which causes people to exercise them to get the common shares, which causes your call options to be assigned, which causes you to end up with a short position in the SPAC which you'll be forced to cover because your broker won't have shares available to borrow. (Speaking from personal experience. :)
Comment by Wei_Dai on Anti-EMH Evidence (and a plea for help) · 2020-12-06T18:08:19.910Z · LW · GW

Gilch made a good point that most investing is like "picking up pennies in front of a steamroller" (which I hadn't thought of in that way before). Another example is buying corporate or government bonds at low interest rates, where you're almost literally picking up pennies per year, while at any time default or inflation could quickly eat away a huge chunk of your principle.

But things like supposedly equivalent assets that used to be closely priced now diverging seems highly suspicious.

Yeah, I don't know how to explain it, but it's been working out for the past several weeks (modulo some experiments I tried to improve upon the basic trade which didn't work). Asked a professional (via a friend) about this, and they said the biggest risk is that the price delta could stay elevated (above your entry point) for a long time and you could end up paying stock borrowing cost for that whole period until you decide to give up and close the position. But even in that case, the potential losses are of the same order of magnitude as the potential gains.

Comment by Wei_Dai on Anti-EMH Evidence (and a plea for help) · 2020-12-06T17:50:07.513Z · LW · GW

At this point, it is very clear that Trump will not become president. But you can still make 20%+ returns shorting ‘TRUMPFEB’ on FTX.

There is a surprisingly large number of people who believe the election was clearly "stolen" and the Supreme Court will eventually decide for Trump. There's a good piece in the NYT about this today. Should they think that the markets are inefficient because they can make 80% returns longing ‘TRUMPFEB’ on FTX? Presumably not, but that means by symmetry your argument is at least incomplete.

I can think of various other ways to easily get 10%+ returns in months in the crytpo markets. For example several crypto futures are extremely underpriced relative to the underlying coin.

This sounds more like my cup of tea. :) Can you provide more details either publicly or privately?

Comment by Wei_Dai on Anti-EMH Evidence (and a plea for help) · 2020-12-06T03:48:35.838Z · LW · GW

Thanks for the link. I was hoping that it would be relevant to my current situation, but having re-read it, it clearly isn't, as it suggests:

It’s much less risky to just sell the stocks as soon as you think there’s a bubble, which foregoes any additional gains but means you avoid the crash entirely (by taking it on voluntarily, sort of).

But this negates the whole point of my strategy, which is to buy these stocks at a "risk-free" price hoping for a bubble to blow up later so I can sell into it.

Comment by Wei_Dai on Cryonics without freezers: resurrection possibilities in a Big World · 2020-12-05T00:55:58.312Z · LW · GW

Now I’m curious. Does studying history make you update in a similar way?

History is not one of my main interests, but I would guess yes, which is why I said "Actually, I probably shouldn’t have been so optimistic even before the recent events..."

I feel that these times are not especially insane compared to the rest of history, though the scale of the problems might be bigger.

Agreed. I think I was under the impression that western civilization managed to fix a lot of the especially bad epistemic pathologies in a somewhat stable way, and was unpleasantly surprised when that turned out not to be the case.

Comment by Wei_Dai on Persuasion Tools: AI takeover without AGI or agency? · 2020-11-21T02:39:11.977Z · LW · GW

You mention "defenses will improve" a few times. Can you go into more detail about this? What kind of defenses do you have in mind? I keep thinking that in the long run, the only defenses are either to solve meta-philosophy so our AIs can distinguish between correct arguments and merely persuasive ones and filter out the latter for us (and for themselves), or go into an info bubble with trusted AIs and humans and block off any communications from the outside. But maybe I'm not being imaginative enough.

Comment by Wei_Dai on Open & Welcome Thread – October 2020 · 2020-10-31T18:04:39.710Z · LW · GW

By "planting flags" on various potentially important and/or influential ideas (e.g., cryptocurrency, UDT, human safety problems), I seem to have done well for myself in terms of maximizing the chances of gaining a place in the history of ideas. Unfortunately, I've recently come to dread more than welcome the attention of future historians. Be careful what you wish for, I guess.

Comment by Wei_Dai on Open & Welcome Thread – October 2020 · 2020-10-31T17:32:27.430Z · LW · GW

Free speech norms can only last if "fight hate speech with more speech" is actually an effective way to fight hate speech (and other kinds of harmful speech). Rather than being some kind of human universal constant, that's actually only true in special circumstances when certain social and technological conditions come together in a perfect storm. That confluence of conditions has now gone away, due in part to technological change, which is why the most recent free speech era in Western civilization is rapidly drawing to an end. Unfortunately, its social scientists failed to appreciate the precious rare opportunity for what it was, and didn't use it to make enough progress on important social scientific questions that will become taboo (or already has become taboo) once again to talk about.

Comment by Wei_Dai on A tale from Communist China · 2020-10-20T22:51:24.389Z · LW · GW

This ended up being my highest-karma post, which I wasn't expecting, especially as it hasn't been promoted out of "personal blog" and therefore isn't as visible as many of my other posts. (To be fair "The Nature of Offense" would probably have a higher karma if it was posted today, as each vote only had one point back then.) Curious what people liked about it, or upvoted it for.

Comment by Wei_Dai on Open & Welcome Thread – October 2020 · 2020-10-19T18:44:04.828Z · LW · GW

There's a time-sensitive trading opportunity (probably lasting a few days), i.e., to short HTZ because it's experiencing an irrational spike in prices. See https://seekingalpha.com/article/4379637-over-1-billion-hertz-shares-traded-on-friday-because-of-bankruptcy-court-filings for details. Please only do this if you know what you're doing though, for example you understand that HTZ could spike up even more and the consequences of that if it were to happen and how to hedge against it. Also I'm not an investment advisor and this is not investment advice.

Comment by Wei_Dai on A tale from Communist China · 2020-10-19T07:12:37.020Z · LW · GW

Lessons I draw from this history:

  1. To predict a political movement, you have to understand its social dynamics and not just trust what people say about their intentions, even if they're totally sincere.
  2. Short term trends can be misleading so don't update too much on them, especially in a positive direction.
  3. Lots of people who thought they were on the right side of history actually weren't.
  4. Becoming true believers in some ideology probably isn't good for you or the society you're hoping to help. It's crucial to maintain empirical and moral uncertainties.
  5. Risk tails are fatter than people think.
Comment by Wei_Dai on Everything I Know About Elite America I Learned From ‘Fresh Prince’ and ‘West Wing’ · 2020-10-19T05:09:23.827Z · LW · GW

Speaking of parents obsessed with getting their kids into an elite university, here's an amazing exposé about a corner of that world that I had little idea existed: The Mad, Mad World of Niche Sports Among Ivy League–Obsessed Parents, Where the desperation of late-stage meritocracy is so strong, you can smell it

Comment by Wei_Dai on A tale from Communist China · 2020-10-18T21:56:47.340Z · LW · GW

Another detail: My grandmother planed to join the Communist Revolution together with two of her classmates, who made it farther than she did. One made it all the way to Communist controlled territory (Yan'an) and later became a high official in the new government. She ended up going to prison in one of the subsequent political movements. Another one almost made it before being stopped by Nationalist authorities, who forced her to write a confession and repentance before releasing her back to her family. That ended up being dug up during the Cultural Revolution and got her branded as a traitor to Communism.

Comment by Wei_Dai on Covid 10/15: Playtime is Over · 2020-10-18T19:32:19.346Z · LW · GW

Upvoted for the important consideration, but your own brain is a source of errors for which it's hard to decorrelate, so is it really worse (or worse enough to justify the additional costs of the alternative) to just trust Zvi instead of your own judgement/integration of diverse sources?

ETA: Oh, I do read the comments here so that helps to catch Zvi's errors, if any.

Comment by Wei_Dai on Open & Welcome Thread – October 2020 · 2020-10-15T16:20:04.107Z · LW · GW

My grandparents on both sides of my family seriously considered leaving China (to the point of making concrete preparations), but didn't because things didn't seem that bad, until it was finally too late.

Comment by Wei_Dai on Open & Welcome Thread – October 2020 · 2020-10-15T16:04:29.269Z · LW · GW

Writing a detailed post is too costly and risky for me right now. One of my grandparents was confined in a makeshift prison for ten years during the Cultural Revolution and died shortly after, for something that would normally be considered totally innocent that he did years earlier. None of them saw that coming, so I'm going to play it on the safe side and try to avoid saying things that could be used to "cancel" me or worse. But there are plenty of articles on the Internet you can find by doing some searches. If none of them convinces you how serious the problem is, PM me and I'll send you some links.

Comment by Wei_Dai on Everything I Know About Elite America I Learned From ‘Fresh Prince’ and ‘West Wing’ · 2020-10-12T01:34:37.180Z · LW · GW

Here is his newsletter archive and subscribe link if anyone wants to check it out.

Comment by Wei_Dai on Everything I Know About Elite America I Learned From ‘Fresh Prince’ and ‘West Wing’ · 2020-10-10T16:42:18.561Z · LW · GW

There's a number of ways to interpret my question, and I kind of mean all of them:

  1. If my stated and/or revealed preferences are that I don't value joining the elite class very much, is that wrong in either an instrumental or terminal sense?
  2. For people who do seem to value it a lot, either for themselves or their kids (e.g., parents obsessed with getting their kids into an elite university), is that wrong in either an instrumental or terminal sense?

By "either an instrumental or terminal sense" I mean is "joining the elite" (or should it be) an terminal value or just a instrumental value? If it's just an instrumental value, is "joining the elite" actually a good way to achieve people's terminal values?

Comment by Wei_Dai on Open & Welcome Thread – October 2020 · 2020-10-03T16:17:50.435Z · LW · GW

Except it's like, the Blight has already taken over all of the Transcend and almost all of the Beyond, even a part of the ship itself and some of its crew members, and many in the crew are still saying "I'm not very worried." Or "If worst comes to worst, we can always jump ship!"

Comment by Wei_Dai on Open & Welcome Thread – October 2020 · 2020-10-02T20:14:16.701Z · LW · GW

Watching cancel culture go after rationalists/EA, I feel like one of the commentators on the Known Net watching the Blight chase after Out of Band II. Also, Transcend = academia, Beyond = corporations/journalism/rest of intellectual world, Slow Zone = ...

(For those who are out of the loop on this, see https://www.facebook.com/bshlgrs/posts/10220701880351636 for the latest development.)

Comment by Wei_Dai on What Does "Signalling" Mean? · 2020-09-17T02:39:56.119Z · LW · GW

eg, birds warning each other that there is a snake in the grass

Wait, this is not the example in the Wikipedia page, which is actually "When an alert bird deliberately gives a warning call to a stalking predator and the predator gives up the hunt, the sound is a signal."

I found this page which gives a good definition of signaling:

Signalling theory (ST) tackles a fundamental problem of communication: how can an agent, the receiver, establish whether another agent, the signaller, is telling or otherwise conveying the truth about a state of affairs or event which the signaller might have an interest to misrepresent? And, conversely, how can the signaller persuade the receiver that he is telling the truth, whether he is telling it or not? This two-pronged question potentially arises every time the interests between signallers and receivers diverge or collide and there is asymmetric information, namely the signaller is in a better position to know the truth than the receiver is. ST, which is only a little more than 30 years old, has now become a branch of game theory. In economics it was introduced by Michael Spence in 1973. In biology it took off not so much when Amotz Zahavi first introduced the idea in 1975, but since, in 1990, Alan Grafen proved formally that ‘honest’ signals can be an evolutionarily stable strategy.

Typical situations that signalling theory covers have two key features:

  • there is some action the receiver can do which benefits a signaller, whether or not he has the quality k, for instance marry him, but
  • this action benefits the receiver if and only if the signaller truly has k, and otherwise hurts her — for instance, marry an unfaithful man.

So in the alarm example, the quality k is whether the bird has really detected the predator, and the "action" is for the predator to give up the hunt. Later in the Wikipedia article, it says "For example, if foraging birds are safer when they give a warning call, cheats could give false alarms at random, just in case a predator is nearby."

Comment by Wei_Dai on Open & Welcome Thread - September 2020 · 2020-09-14T17:54:17.015Z · LW · GW

Did it make you or your classmates doubt your own morality a bit? If not, maybe it needs to be taught along with the outside view and/or the teacher needs to explicitly talk about how the lesson from history is that we shouldn't be so certain about our morality...