Posts

My Thoughts on Takeoff Speeds 2018-03-27T00:05:33.482Z
Expert Iteration From the Inside 2017-11-02T21:11:02.825Z
One-Magisterium Bayes 2017-06-29T23:02:54.784Z
Mode Collapse and the Norm One Principle 2017-06-05T21:30:58.027Z

Comments

Comment by tristanm on Are ethical asymmetries from property rights? · 2018-07-02T04:37:45.928Z · LW · GW

A couple of guesses for why we might see this, which don't seem to depend on property:

  • An obligation to act is much more freedom-constraining than a prohibition on an action. The more and more one considers all possible actions with the obligation to take the most ethically optimal one, the less room they have to consider exploration, contemplation, or pursuing their own selfish values. Prohibition on actions does not have this effect.
  • The environment we evolved in had roughly the same level of opportunity to commit harmful acts, bur far less opportunity to take positive consequentialist action (and far less complicated situations to deal with). It was always possible to hurt your friends and suffer consequences, but it was rare to have to think about the long term consequences of every action.
  • The consequences of killing, stealing, and hurting people are easier to predict than altruistic actions. Resources are finite, therefore sharing them can be harmful or beneficial, depending on the circumstances and who they are shared with. Other people can defect or refuse to reciprocate. If you hurt someone, they are almost guaranteed to retaliate. If you help someone, there is no guarantee there will be a payoff for you.
Comment by tristanm on OpenAI releases functional Dota 5v5 bot, aims to beat world champions by August · 2018-06-28T01:22:07.538Z · LW · GW

It seems to construct an estimate of it by averaging a huge number of observations together before each update (for Dota 5v5, they say each batch is around a million observations, and I'm guessing it processes about a million batches). The surprising thing is that this works so well, and it allows leveraging of computational resources very easily.

My guess for how it deals with partial observability in a more philosophical sense is that it must be able to store an implicit model of the world in some way, in order to better predict the reward it will eventually observe. I'm beginning to wonder if the distinction between partial and full observability isn't very useful after all. Even with AlphaGo, even though it can see the whole board, there are also a whole bunch of "spaces" it can't see fully, possibly the entire action space, the space of every possible game trajectory, or the mathematical structure of play strategies. And yet, it managed to infer enough about those spaces to become good at the game.

Comment by tristanm on OpenAI releases functional Dota 5v5 bot, aims to beat world champions by August · 2018-06-27T13:48:58.793Z · LW · GW

I don't know how hard it would be to do a side by side "FLOPS" comparison of Dota 5v5 vs AlphaGo / AlphaZero, but it seems like they are relatively similar in terms of computational cost required to achieve something close to "human level". However, as has been noted by many, Dota is a game of vastly more complexity because of its continuous state, partial observability, large action space, and time horizon. So what does it mean when it requires roughly similar orders of magnitude of compute to achieve the same level of ability as humans, using a fairly general architecture and learning algorithm?

Some responses to AlphaGo at the time were along the lines of "Don't worry too much about this, it looks very impressive, but the game still has a discrete action space and is fully observable, so that explains why this was easy."

Comment by tristanm on Rationality and Spirituality - Summary and Open Thread · 2018-04-22T14:52:14.736Z · LW · GW

I've been meditating since I was about 19, and before I came across rationality / effective altruism. There is quite a bit of overlap between the sets of things I've been able to learn from both schools of thought, but I think there are still a lot of very useful (possibly even necessary) things that can only be learned from meditative practices right now. This is not because rationality is inherently incapable of learning the same things, but because within rationality it would take very strong and well developed theories, perhaps developed through large scale empirical observations of human behavior, to come to the same conclusions. On the other hand, with meditation a lot of these same conclusions are just "obvious."

Most of these things have to do with subtle issues of psychology, particularly with values and morality. For example, before I began meditating, I generally believed that:

  • Moral principles could be determined logically from a set of axioms that were "self-evidently true" and that once I deduced those things, I would simply follow them.
  • The set of things that seemed to make me happy, like having friends, being in love, feeling accomplished, were not incompatible with true moral principles, and in fact were instrumentally helpful in achieving terminal moral goals.
  • I intrinsically value what is moral. If it ever seemed like I valued what was not moral, I could chalk it up to temporary or easily surmountable issues, like vestigial animal instincts or lack of willpower. Basically desires that could be easily overridden.
  • Pleasure, pain, and emotions were more like guidelines, things that made it possible to act quickly in certain situations. Insofar as certain forms of pleasure were "intrinsic values" (like love) they did not interfere with moral goals. They were not things that determined my behavior very strongly, and certainly they didn't have subtle cascading effects on the entire set of my beliefs.

After having meditated for a long time, many of these beliefs were eradicated. Right now it seems more likely that:

  • My values are not even consistent, let alone determined by moral principles. It's not clear that deducing a good set of moral principles could even change my values.
  • My values are malleable, but not easily malleable in a direction that can be controlled by me (not without a ton of meditation, anyway).
  • The formalization of my values in my mind are not a good predictor of what my actions will be. A better predictor involves far more short term mechanisms in my psyche.
  • The beliefs I had prior to meditating were more likely constructed so that I could report these to other people in a way that would make them more likely to value me and approve of me.
  • Values that truly do seem hard to deconstruct are surprisingly selfish. For example, I assumed that I valued approval from other humans because this was an instrumental goal in helping me judge the quality of my actions. It now seems more likely that social approval is in fact an intrinsic goal, which is very worrying to me in regards to my ability to attain my altruistic goals.

If it turns out that meditating has given me better self-reflective capabilities, and the things I've observed are accurate, then this has some pretty far-reaching implications. If I'm not extremely atypical, then most people are probably very blind to their own intrinsic values. This is a worrying prospect for the long-term efficacy of effective altruism.

Hopefully this isn't too controversial to say, but it seems to me like a lot of the main currents within EA are operating more-or-less along the lines of my prior-to-meditating beliefs. Here I'm thinking about the type of ethics where you are encouraged to maximize your altruistic output. Things like, "earn to give", "choose only the career that maximizes your ability to be altruistic", "donate as much of your time and energy as you can to being altruistic", etc. Of course EA thought is very diverse, so this doesn't represent all of it. But the way that my values currently seem structured, it's probably unrealistic that I could actually fulfill these, unless I experienced an abnormally large amount of happiness for each altruistic act that outweighed most of my other values. It's of course possible that I'm unusually selfish or even a sociopath, but my prior on that is very low.

On the other hand, if my values really are malleable, and it is possible to influence those values, then it makes sense for me to spend a lot of time deciding how that process should proceed. This is only possible because my values are inconsistent. If they were consistent, it would be against my values to change them, but it seems that once a set of values is inconsistent, it could actually make sense to try to alter them. And meditation might turn out to be one of the ways to make these kind of changes to your own mind.

Comment by tristanm on Local Validity as a Key to Sanity and Civilization · 2018-04-09T19:44:59.598Z · LW · GW

It seems like in the vast majority of conversations, we find ourselves closer to the "exposed to the Deepak Chopra version of quantum mechanics and haven't seen the actual version yet" situation than we do to the "Arguing with someone who is far less experienced and knowledgeable than you are on this subject." In the latter case, it's easy to see why steelmanning would be counterproductive. If you're a professor trying to communicate a difficult subject to a student, and the student is having trouble understanding your position, it's unhelpful to try to "steelman" the student (i.e. try to present a logical-sounding but faulty argument in favor of what the student is saying), but it's far more helpful to the student to try to "pass their ITT" by modeling their confusions and intuitions, and then use that to try to help them understand the correct argument. I can imagine Eliezer and Holden finding themselves in this situation more often than not, since they are both experts in their respective fields and have spent many years refining their reasoning skills and fine-tuning the arguments to their various positions on things.

But in most situations, for most of us who may not quite know how strong the epistemological ground we stand on really is, are probably using some mixture of flawed intuitions and logic to present our understandings of some topic. We might also be modeling people whom we really respect as being in a similar situation as we are. In which case it seems like the line between steelmanning and ITT becomes a bit blurry. If I know that both of us are using some combination of intuition (prone to bias and sometimes hard to describe), importance weighting of various facts, and different logical pathways to reach some set of conclusions, both trying to pass each other's ITT as well as steelmanning potentially have some utility. The former might help to iron out differences in our intuitions and harder to formalize disagreements, and the latter might help with actually reaching more formal versions of arguments, or reasoning paths that have yet to be explored.

But I do find it easy to imagine that as I progress in my understanding and expertise in some particular topic, the benefits of steelmanning relative to ITT do seem to decrease. But it's not clear to me that I (or anyone outside of the areas they spend most of their time thinking about) have actually reached this point in situations where we are debating with or cooperating on a problem together with respected peers.

Comment by tristanm on Local Validity as a Key to Sanity and Civilization · 2018-04-08T14:15:44.415Z · LW · GW

I don't see him as arguing against steelmanning. But the opposite of steelmanning isn't arguing against an idea directly. You've got to be able to steelman an opponent's argument well in order to argue against it well too, or perhaps determine that you agree with it. In any case, I'm not sure how to read a case for locally valid argumentation steps as being in favor of not doing this. Wouldn't it help you understand how people arrive at their conclusions?

Comment by tristanm on April Fools: Announcing: Karma 2.0 · 2018-04-01T14:44:37.182Z · LW · GW

I would also like to have a little jingle or ringtone play every time someone passes over my comments, please implement for Karma 3.0 thanks

Comment by tristanm on Naming the Nameless · 2018-03-23T03:22:27.125Z · LW · GW

What's most unappealing to me about modern, commercialized aesthetics is the degree to which the bandwidth is forced to be extremely high - something I'd call the standardization of aesthetics. When I walk down the street in the financial district of SF, there's not much variety to be found in people's visual styles. Sure, everything looks really nice, but I can't say that it doesn't get boring after a while. It's clear that a lot of information is being packed into people's outfits, so I should be able to infer a huge amount about someone just by looking at them. Same thing with websites. There's really only one website design. Can it truly be said that there is something inherently optimal about these designs? I strongly suspect no. There are more forces at play that guarantee convergence that don't depend on optimality.

Part of it might be the extremely high cost of defection. As aesthetics is a type of signalling mechanism, most of what Robin Hanson says applies here. It's just usually not worth it to be an iconoclast or truly original. And at some point we just start believing the signals are inherently meaningful, because they've been there for so long. But all it takes is to look at the different types of beauty produced by other cultures or at different points in human history to see that this is not the case. The color orange, in silicon valley, might represent "innovation" or "ingenuity" (look at Tensorflow's color scheme), but the orange robes of Buddhist monks evoke serenity, peace and compassion (but of course the color was originally dependent on the dyes that were available). However, one can also observe that there is little variety within each culture as well, suggesting that the same forces pushing towards aesthetic convergence are at play.

The sum of the evidence suggests to me that I am getting an infinitesimal fraction of the possible pleasant aesthetic experiences which could feasibly be created by someone given that they were not subject to signalling constraints. This seems deeply disappointing.

Comment by tristanm on Prize for probable problems · 2018-03-09T18:09:27.107Z · LW · GW

It seems like this objection might be empirically testable, and in fact might be testable even with the capabilities we have right now. For example, Paul posits that AlphaZero is a special case of his amplification scheme. In his post on AlphaZero, he doesn't mention there being an aligned "H" as part of the set-up, but if we imagine there to be one, it seems like the "H" in the AlphaZero situation is really just a fixed, immutable calculation that determines the game state (win/loss/etc.) that can be performed with any board input, with no risk of the calculation being incorrectly performed, and no uncertainty of the result. The entire board is visible to H, and every board state can be evaluated by H. H does not need to consult A for assistance in determining the game state, and A does not suggest actions that H should take (H always takes one action). The agent A does not choose which portions of the board are visible to H. Because of this, "H" in this scenario might be better understood as an immutable property of the environment rather than an agent that interacts with A and is influenced by A. My question is, to what degree is the stable convergence of AlphaZero dependent on these properties? And can we alter the setup of AlphaZero such that some or all of these properties are violated? If so, then it seems as though we should be able to actually code up a version in which H still wants to "win", but breaks the independence between A and H, and then see if this results in "weirder" or unstable behavior.

Comment by tristanm on Circling · 2018-02-27T18:59:36.715Z · LW · GW

I can't emphasize enough how important the thing you're mentioning here is, and I believe it points to the crux of the issue more directly than most other things that have been said so far. 

We can often weakman postmodernism as making basically the same claim, but this doesn't change the fact that a lot of people are running an algorithm in their head with the textual description "there is no outside reality, only things that happen in my mind." This algorithm seems to produce different behaviors in people than if they were running the algorithm "outside reality exists and is important." I think the first algorithm tends to produce behaviors that are a lot more dangerous than the latter, even though it's always possible to make philosophical arguments that make one algorithm seem much more likely to be "true" than the other. It's crucial to realize that not everyone is running the perfectly steelmanned version of such algorithms to do with updating our beliefs based on observations of the processes of how we update on our beliefs, and such things are very tricky to get right. 

Even though it's valid to make observations of the form "I observe that I am running a process that produces the belief X in me", it is definitely very risky to create a social norm that says such statements are superior to statements like "X is true" because such norms create the tendency to assign less validity to statements like "X is true". In other words, such a norm can itself become a process that produces the belief "X is not true" when we don't necessarily want to move our beliefs on X just because we begin to understand how the processes work. It's very easy to go from "X is true" to "I observe I believe X is true" to "I observe there are social and emotional influences on my beliefs" to "There are social and emotional influences on my belief in X" to finally "X is not true" and I can't help but feel a mistake is being made somewhere in that process. 

Comment by tristanm on The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation · 2018-02-25T19:20:42.238Z · LW · GW

I could probably write a lot more about this somewhere else, but I'm wondering if anyone else felt that this paper seemed to be kind of shallow. This comment is probably too brief to really do this feeling justice, but I'll probably decompose this into two things I found disappointing:

  1. "Intelligence" is defined in such a way that leaves a lot to be desired. It doesn't really define it in a way that makes it qualitatively different than technology in general ("tasks thought to require intelligence" is probably much less useful than "narrowing the set of possible futures into one that match an agent's preference ordering."). For this reason, the paper imagines a lot of scenarios that amount to basically one party being able to do one narrow task much better than another party. This is not specific enough to really narrow us down to any approaches that deal with AI more generally.
  2. As a consequence of the choice of the authors to leave their framework sort of fuzzy, their suggestions for how to respond to this problem also take on this fuzziness. For example their first suggestion is that policy leaders should consult with AI researchers. This reads a little bit like an applause light, and it doesn't seem to offer many suggestions about how to make this more likely, or about how to make sure that policy leaders are well informed enough to make sure they are considering the right people to be advised by and take their advice seriously.

Overall I'm happy that these kinds of things can be discussed by a large group of various organizations. But I think any public efforts to work towards mitigating AI risk need to be very careful that they aren't losing something extremely important by trying to appeal to too many uncertainties and disagreements at once.

Comment by tristanm on Mythic Mode · 2018-02-25T02:27:20.250Z · LW · GW

This is partly a reply and partly an addendum to my first comment. I've been thinking about a sort of duality that exists within the rationalist community to a degree and that has become a lot more visible lately, in particular with posts like this. I would refer to this duality as something like "The Two Polarities of Noncomformists", although I'm sure someone could think of something better to call it. The way I would describe it is that, communities like this one are largely composed of people who feel fairly uncomfortable with the way they are situated in society, either because they are outliers on some dimension of personality, interests, or intellect, or because of the degree to which they are sensitive to social reality around them. What this leads to is basically a bimodal distribution of people where both modes are outliers, with respect to the distribution of people in general, on one axis (namely the way that social reality is sensed) but on opposite ends. And these two groups differ very strongly in the way that their values are formed and quite possibly even in subtle ways reality itself is perceived. 

On the one hand, you have the "proper non-conformists" who are somewhat unplugged from the Omega / distributed network you are describing, and who I imagine will have trouble digesting a lot of your claims here. I call them proper non-conformists because they genuinely seem to not feel so much of the tension the tugs in the network are giving them. I think there's a connection here with people who consider themselves "status blind" or tend to visualize Slack only in a very concrete, visible sense like having a lot of wealth. They might tend to have aversion to heavily socially conscious displays of signalling and things like that. You suspect that this might be what "psychopathy" is, and I think there might be some partial truth to that, but ultimately not in the sense of how one would visualize a psychopath as an evil person or a person completely disconnected from reality as a whole. For example, one might have the ability to do altruistic things in a way that is invisible, in a way that could be very difficult for someone strongly connected to Omega.

On the other hand you have the people who are very tightly connected to the Matrix and are highly sensitive to its inputs, and the increased sensitivity is what made it likely for them to realize it exists in the first place. This group is constantly feeling the networks' tugs and can't disconnect from it (perhaps not without Looking, anyway), so a lot of their strategies tend to encompass trying to fit into it as well as they can, mastering it's tricks, and overall playing the game extremely well. People from here could learn to be highly charismatic if they choose to do so, but they might risk being seen as ultra-conformists. They still end up being non-conformists in the sense of not being neurotypical, and because a lot of their practices will seem extreme if one were to really examine them in detail. This group might have serious difficulty imagining not being inside the network and may even be skeptical that someone could still be able to function if they were.

(I'm going to avoid trying to place anyone in particular in one or the other of these). 

I think the second group will have the upper hand in terms of group coordination problems because they have direct access to all the mechanisms, but have the disadvantage of being prone to inadequate equilibria problems. The first group is in the opposite situation. But in the end I think the first group can't really "Look" at the network from within - ideally you want to be inside of it so you have direct access, but also with the ability to see it for what it really is in some sense. So it's possible that "Lookers" in the second group could accomplish much larger stuff than "Lookers" from the first group.

If that's correct, then it might suggest that, with Looking, attempts to move more toward polarity one might be less fruitful than attempts to move toward polarity two. And our previous inertia seemed to be going more towards polarity one.

Comment by tristanm on Mythic Mode · 2018-02-24T14:39:13.324Z · LW · GW

I think I've been doing "mythic mode" for nearly my entire life, but not because I came up with this idea explicitly and intentionally deployed it, but just because it sort of happens on its own without any effort. It's not something that happens at every moment, but frequently enough that it's a significant driving force.

But it used to be that I didn't notice myself doing this at all, this is just how it felt to be a person with goals and desires, and doing mythic mode was just how that manifested in my subconscious desires. And when I followed along in stories and fiction it gave rise to a very similar feeling. In some cases, this might have manifested by my "trying on" different roles by imagining myself in the story as a character. To this day this still happens to a degree, with the additional noticing of it happening in real time, and questioning if I can subtly modify these stories in any way. I'm also much, much more careful about how seriously I take these feelings. I could probably attribute several great missteps to relying on it too heavily.

Since this kind of story-creating and script writing seems to happen so fluidly and frequently, it seems very difficult to know how and when it arises or is triggered, which seems to happen very subconsciously, and very hard to know how changing any individual piece of it will affect things down the road. 

You seem to have re-derived Jungian Archetypes with the distributed network / Omega playing the role of the collective unconscious. I think the main difference is that you posit the distributed intelligence to be able to predict people's actions. I think this is probably more true the more we believe it is true - obviously it sort of "wants" you to be predictable by rewarding you for following some sort of predefined path, which by definition are easier to predict. The punishment for not following these paths is for it to remove your ability to predict how the network will respond. How I interpret your advice here is that if we find ourselves unhappy with our state of affairs we should try to find locations where there are either forks in the paths or places where they are quite close together to make a quick jump...never straying outside of a path for very long.

The main problem I see is that we need to be able to make predictions in the spaces between narratives, which according to this framework is difficult if not impossible.

Comment by tristanm on Spamming Micro-Intentions to Generate Willpower · 2018-02-23T16:05:53.139Z · LW · GW

Or more generally: Break up a larger and more difficult task into a chain of subtasks, and see if you have enough willpower to accomplish just the first piece. If you can, you can allow yourself a sense of accomplishment and use that as a willpower boost to continue, or try to break up the larger task even further.

Comment by tristanm on Spamming Micro-Intentions to Generate Willpower · 2018-02-23T15:59:24.700Z · LW · GW

If this works and people are able to get themselves to do more complex and willpower heavy tasks that they wouldn't normally be able to do, wouldn't that be a good thing by default? Or are you worried that it would allow people with poorly aligned incentives to do more damage?

Comment by tristanm on Circling · 2018-02-18T16:35:58.196Z · LW · GW

Circling seems like one of those things where both its promoters and detractors vastly overestimate its effects, either positive or negative. Like a lot the responses to this are either "it's pretty cool" or "it's pretty creepy." What about "meh"? The most likely outcome is that Circling does something extremely negligible if it does anything at all, or if it does seem to have some benefit it's because of the extra hour you set aside to think about things without many other distractions. In which case, a question I'd ask is "What was inadequate about the boring technique that doesn't have a name because it's so obvious?" Or, if you want to make a comparison to meditation / hypnosis, other stuff: When you stumbled across a Chesterton's Fence, what made you go "Hey, let's try moving this fence 50 feet to the left!"? You'll either get a) something that works pretty much the same as some more traditional practice or b) something that doesn't work at all.

Comment by tristanm on [deleted post] 2018-02-16T23:36:40.894Z

Personally I wonder how much of this disagreement can be attributed to prematurely settling on specifc fundamental positions or some hidden metaphysics that certain organizations have (perhaps unknowingly) committed to - such as dualism or pansychism. One of the most salient paragraphs from Scott's article said:

Morality wasn’t supposed to be like this. Most of the effective altruists I met were nonrealist utilitarians. They don’t believe in some objective moral law imposed by an outside Power. They just think that we should pursue our own human-parochial moral values effectively. If there was ever a recipe for a safe and milquetoast ethical system, that should be it. And yet once you start thinking about what morality is – really thinking, the kind where you try to use mathematical models and formal logic – it opens up into these dark eldritch vistas of infinities and contradictions. The effective altruists started out wanting to do good. And they did: whole nine-digit-sums worth of good, spreadsheets full of lives saved and diseases cured and disasters averted. But if you really want to understand what you’re doing – get past the point where you can catch falling apples, to the point where you have a complete theory of gravitation – you end up with something as remote from normal human tenderheartedness as the conference lunches were from normal human food.

I feel like this quote has some extremely deep but subtly stated insight that is in alignment with some of the points you made. Somehow, even if we all start from the position that there is no univeral or ultimately real morality, when we apply all of our theorizing, modeling, debate, measurement, thinking, etc., this somehow leads us to making absolutist conclusions about what the "truly most important thing" is. And I wonder if this is primarily a social phenomenon: In the process of debate and organizing groups of people to accomplish things, it's easier if we all converge to agreement about specific and easy to state questions.

A possible explanation for Scott's observed duality between the "suits" on the one hand who just want to do the most easily-measurable good, and the "weirdos" on the other hand who want to converge to rigorous answers on the toughest of philosophical questions (and those answers tend to look pretty bizarre), and the fact that these are often the same people - my guess is this has something to do with coverging to agreement on relatively formalizable questions. Those questions often appear in two forms: The "easy to measure" kind of questions (how many people are dying from malaria, how poor is this group of people, etc.), and the "easy to model or theorize about" questions (what do we mean by suffering, what counts as a conscious being, etc.), and so you see a divergence of activity and effort spent between those two forms of questions.

"Easy" is meant in a relative sense, of course. Unfortunately, it seems that the kind of questions that interest you (and which I agree are of crucial importance) fall into the "relatively hard" category, and therefore are much more difficult to organize concerted efforts around.

Comment by tristanm on Rationalist Lent · 2018-02-15T16:09:49.276Z · LW · GW

I'm actually having a hard time deciding what kind of superstimuli are having too strong of a detrimental effect on my actions. The reason for this is that some superstimuli also act as a willpower restorer. Take music, for example. Listening to music does not usually get mentioned as a bad habit, but it also is an extremely easy stimuli to access, requires little to no attention or effort to maintain use of, and at least for me, tends to amplify the degree of mind wandering and daydreaming. On the other hand, it is a huge mood booster and increases confidence and determination to complete a lot of other tasks during the day, so in that regard it does seem to be helpful. But I could probably say something similar about many superstimuli, and so I wonder if straight "giving up" would be a less effective strategy than trying to optimize some kind of schedule for usage of each type of superstimulus.

Comment by tristanm on The end of public transportation. The future of public transportation. · 2018-02-11T21:01:19.418Z · LW · GW

I'm not actually seeing why this post is purely an instance of conjunctive fallacy. A lot of the details he describes are consequences of cars being autonomous or indirect effects of this. And that's not to say there are no errors here, just that I don't think it's merely a list of statements A,B,C,etc with no causal relationship.

Comment by tristanm on Pseudo-Rationality · 2018-02-07T02:32:24.377Z · LW · GW

If you define "rationality" as having good meta level cognitive processes for carving the future into a narrower set of possibilities in alignment with your goals, then what you've described is simply a set of relatively poor heuristics for one specific set of goals, namely, the gaining of social status and approval. One can have that particular set of goals and still be a relatively good rationalist. Of course, where do you draw the line between "pseudo" and "actual" given that we are all utilizing cognitive heuristics to some degree? I see the line being drawn as sort of arbitrary.

Comment by tristanm on "Cheat to Win": Engineering Positive Social Feedback · 2018-02-06T04:51:21.636Z · LW · GW

I think the driving motivator for seeking out high variance in groups of people to interact with is an implicit belief that my value system is malleable and a stong form of modesty concerning my beliefs about what values I should have. Over time I realized that my value system isn't really all that malleable and my intuitions about it are much more reliable indicators than observing a random sample of people, therefore a much better strategy for fulfilling goals set by those values is to associate with people who share them.

Comment by tristanm on "Taking AI Risk Seriously" (thoughts by Critch) · 2018-01-31T22:23:45.214Z · LW · GW
If the latter gets too large, then you start getting swarmed with people who want money and prestige but don't necessarily understand how to contibute, who are incentivized to degrade the signal of what's actually important.

During this decade the field of AI in general became one of the most prestigious and high-status academic fields to work in. But as far as I can tell, it hasn't slowed down the rate of progress in advancing AI capability. If anything, it has sped it up - by quite a bit. It's possible that a lot of newcomers to the field are largely driven by the prospect of status gain and money. And there are quite a few "AI" hype-driven startups that have popped up and seem doomed to fail, but despite this, it doesn't seem to be slowing the pace of the most productive research groups. Maybe the key here is that if you suddenly increase the prestige of a scientific field by a dramatic amount, you are bound to get a lot of nonsense or fraudulent activity, but this might be constrained to being outside of serious research circles. And the most serious people working in the field are likely to be helped by the rising tide as well, due to increased visibility and funding to their labs and so on.

It's also my understanding that the last few years (during the current AI boom) have been some of the most successful (financially and productively) for MIRI in their entire history.

Comment by tristanm on "Taking AI Risk Seriously" (thoughts by Critch) · 2018-01-30T05:00:57.648Z · LW · GW

I'm curious as to whether or not the rationalsphere/AI risk community has ever experimented with hiring people to work on serious technical problems who aren't fully aligned with the values of the community or not fully invested in it already. It seems like ideological alignment is a major bottleneck to locating and attracting relevant skill levels and productivity levels, and there might be some benefit to being open about tradeoffs that favor skill and productivity at the expense of not being completely committed to solving AI risk.

Comment by tristanm on A LessWrong Crypto Autopsy · 2018-01-29T02:47:37.955Z · LW · GW

(Re-writing this comment from the original to make my point a little more clear).

I think it is probably quite difficult to map the decisions of someone on a continuum from really bad to really good if you can't simulate the outcomes of many different possible actions. There's reason to suspect that the "optimal" outcome in any situation looks vastly better than even very good but slightly sub-optimal decisions, and vise-versa for the least optimal outcome.

In this case we observed a few people who took massive risks (by devoting their time and energy into understanding or developing a particular technology which very well may have turned out to be a boondoggle) receive massive rewards from the success of it, although it could have very well turned out differently, based on what everyone knew at the time. I think the arguments for cryptocurrency becoming sucessful that existed in the past were very compelling but they weren't exactly airtight logical proofs (and still aren't even now). Not winning hugely because a legitimately large risk wasn't taken isn't exactly "losing" (and while buying bitcoins when they were cheap wasn't a large risk, investing time and energy into becoming knowledgable enough about crypto to know it was worth taking the chance may have been. A lot of the biggest winners were people who were close to the development of cryptocurrencies).

But even so, a few of these winners are close to the LW community and have invested in its development or some of its projects. Doesn't that count for something? Can they be considered part of the community too? I see no reason to keep the definition so strict.

Comment by tristanm on Taking it Private: Short Circuiting Demon Threads (working example) · 2018-01-23T21:12:55.786Z · LW · GW

Mostly I just want people to stop bringing models about the other person's motives or intentions into conversations, and if tabooing words or phrases won't accomplish that, and neither will explicitly enforcing a norm, then I'm fine not going that route. It will most likely involve simply arguing that people should adopt a practice similar to what you mentoned.

Comment by tristanm on Taking it Private: Short Circuiting Demon Threads (working example) · 2018-01-23T21:04:10.156Z · LW · GW

Confusion in the sense of one or both parties coming to the table with incorrect models is a root cause, but this is nearly always the default situation. We ostensibly partake in a conversation in order to update our models to more accurate ones and reduce confusion. So while yes, a lack of confusion would make bad conversations less likely, it also just reduces the need for the conversation to begin with.

And here we’re talking about a specific type of conversation that we’ve claimed is a bad thing and should be prevented. Here we need to identify a different root cause besides “confusion” which was too general of a root cause to explain these specific types of conversations.

What I’m claiming as a candidate cause is that there are usually other underlying motives for a conversation besides resolving disagreement. In addition people are bringing models of the other person’s confusion / motives in to the discussion, and that’s what I argue is causing problems and is a practice that should be set aside.

I think the Kensho post did spawn demon threads and that these threads contained the characteristics I mentioned in my original comment.

Comment by tristanm on Taking it Private: Short Circuiting Demon Threads (working example) · 2018-01-23T17:55:18.946Z · LW · GW

I'm not really faulting all status games in general, only tactics which force them to become zero-sum. It's basically unreasonable to ask that humans change their value systems so that status doesn't play any role, but what we can do is alter the rules slightly so that outcomes we don't like become improbable. If I'm accused of being uncharitable, I have no choice but to defend myself, because being seen as "an uncharitable person" is not something I want to be included in anyone's models of me (even in the case where it's true). Even in one-on-one coversations there's no reason to disengage if this claim was made against me. Especially when it's a person you trust or admire (more likely if it's a private conversation) and therefore I care a lot what the other person thinks of me. That's where the stickyness of demon threads comes from, where disengaging results in the loss of something for either party.

There's a second type of demon thread where participants get dragged into dead ends that are very deep in, without a very clear map of where the conversation is heading. But I think these reduce to the ususal problems of identifying and resolving confusion, and can't really be resolved by altering incentives / discussion norms.

Comment by tristanm on Taking it Private: Short Circuiting Demon Threads (working example) · 2018-01-22T19:10:22.570Z · LW · GW

If the goal is for conversations to be making epistemic progress, with the caveat that individual people have additional goals as well (such as obtaining or maintaining high status within their peer group), and Demon Threads “aren’t important” in the sense that they help neither of these goals, then it seems the solution would simply be better tricks participants in a discussion can use in order to notice when these are happening or likely to happen. But I think it’s pretty hard to actually measure how much status is up for grabs in a given conversation. I don’t think it’s literally zero - I remember who said what in a conversation and if they did or didn’t have important insights - but it’s definitely possible that different people come in with different weightings of importance of epistemic progress vs. being seen as intelligent or insightful. The key to the stickiness and energy-vacuum nature of the demon threads, I think, is that if social stakes are involved, they are probably zero-sum, or at least seen that way.

I have personally noticed that many of the candidate “Demon” threads contain a lot of specific phrases that sort of give away that social stakes are involved, and that there could be benefits to tabooing some of these phrases. To give some examples:

  • “You’re being uncharitable.”
  • “Arguing in bad faith.”
  • “Sincere / insincere.”
  • “This sounds hostile” (or other comments about tone or intent).

These phrases are usually but not always negative, as they can be used in a positive sense (i.e. charitable, good faith, etc.) but even in this case they are more often used to show support for a certain side, cheerleading, and so on. Generally, they have the characteristic of making a claim about or describing your opponent’s motives. How often is it actually necessary or useful to make such claims?

In the vast majority of situations, it is next to impossible to know the true motives of your debate partner or other conversation participants, and even in the best case scenario, poor models will be involved (combined with the fact that the internet tends to make this even more difficult). In addition, an important aspect of status games is that it is necessary to hide the fact that a status game is being played. Being “high-status” means that you are perceived as making an insightful and relevant point at the most opportune time. If someone in a conversation is being perceived as making status moves, that is equivalent to being perceived as low status. That means that the above phrases turn into weapons. They contain no epistemically useful information, and they are only being used to make the interaction zero-sum. Why would someone deliberately choose to make an interaction zero-sum? That’s a harder question, but my guess would be that it is a more aggressive tactic to get someone to back down from their position, or just our innate political instincts assuming the interaction is already zero-sum.

There is no need for any conversation to be zero-sum, necessarily. Even conversations where a participant is shown to be incorrect can lead to new insights, and so status benefits could even be conferred on the “losers” of these conversations. This isn’t denying social reality, it just means that it is generally a bad idea to make assumptions about someone else’s intent during a conversation, especially negative assumptions. I have seen these assumptions lead to a more productive discussion literally zero times.

So additional steps I might want to add:

  1. Notice if you have any assumptions or models about your conversation partner’s intents. If yes - just throw them out. Even positive ones won’t really be useful, negative ones will be actively harmful.
  2. Notice your own intents. It’s not wrong to want to gain some status from the interactions. But if you feel that if your partner wins, you lose, ask yourself why. Taking the conversation private might help, but you might also care about your status in the eyes of your partner, in which case turning the discussion private might not change this. Would a different framing or context allow you both to win?
Comment by tristanm on Kenshō · 2018-01-21T07:21:12.514Z · LW · GW

I'm trying to decide whether or not I understand what "looking" is, and I think it's possible I do, so I want to try and describe it, and hopefully get corrected if it turns out I'm very wrong. 

Basically, there's sort of a divide between "feeling" and "Feeling" and it's really not obvious that there should be, since we often make category errors in referring to these things. On the one hand, you might have the subjective feeling of pain, like putting your hand on something extremely hot. Part of that feeling of pain is the very strong sensation on your hand. Another part of the pain is the sense that you should not do that. This Feeling is the part that sucks. This is the part that you don't want. 

It turns out that those two types of subjective experience aren't one in the same and aren't inseparable. For the vast majority of situations where you notice that one occurs you also notice the other. However, (and it's a big however), there are some times where the first type appears without the second type. It just so happens that our brain is wired so that you never notice that specific situation. But it occurs frequently enough that if you could notice, if you could Look, you would immediately discover that it's always been a factor of your experience. And that's what I'm guessing Valentine means when he says "just look up, it's so obvious!" It IS obvious, once you see it, but seeing it for the first time is probably hard. 

To describe a situation where I think this is likely to occur, imagine accidentally stubbing your toe, feeling the pain from it, then shortly after being told some stunning news about the death of a loved one from someone else. In that brief moment where you are stunned by the news and your mind shifts to that new piece of information, it briefly loses the sense of suffering from the pain of the stubbed toe, although the sensation is still there. Once your mind has completed the shift, it may return to feeling the unpleasantness of the pain combined with whatever new feeling it received. 

But importantly, it turns out your brain is doing things like the above constantly. I used that example only because the effect would be much more pronounced. But your mind does these awareness shifts so frequently and so quickly that it's usually unlikely to be aware of the brief moment where there can be a sensation of something before the associated emotional response. Learning to Look is basically learning to detect when these shifts happen and catch them in the act, and that is why meditation is usually prescribed to make this more likely. It's also about learning that this effect can be controlled to some degree if you have some mastery over your attention. 

It also is not really limited to physical sensations. Any kind of thought or state of awareness may have associated positive or negative mental states, and these too can be sort of detached from in a similar way. 

When phrased like the above, the benefits seem obvious. If you had the choice not to suffer, wouldn't you take it? The reason I think Valentine may not state that so bluntly is that there is an equally obvious objection: if I could choose to not suffer on demand, what would prevent me from doing harmful things to myself or others? Would I even be aligned with my current goals? And I think that question needs an extremely careful answer. 

I think the much less obvious and surprising answer is that you are still aligned with your previous goals, and may even be better equipped to reach them, but I don't feel like I have the skills to really argue for this point, and will completely understand any skepticism towards this. 

It's very possible I'm describing some thing either completely different or at least much more mundane than Valentine is. The biggest factor that leads me to believe this is that Kensho seems to be a much more black and white, you either see it or you don't sort of thing, whereas what I'm talking about seems to require a gradual process of noticing, recognizing and learning to influence. 

Comment by tristanm on Beware of black boxes in AI alignment research · 2018-01-19T19:42:04.264Z · LW · GW
We need to understand how the black box works inside, to make sure our version's behavior is not just similar but based on the right reasons.

I think here "black-box" can be used to refer to two different things, one to refer to things in philosophy or science which we do not fully understand yet, and also to machine learning models like neural networks that seem to capture their knowledge in ways that are uninterpretable to humans.

We will almost certainly require the use of machine learning or AI to model systems that are beyond our capabilities to understand. This may include physics, complex economic systems, the invention of new technology, or yes, even human values. There is no guarantee that a theory that describes our own values can be written down and understood fully by us.

Have you ruled out any kind of theory which would allow you to know for certain that a "black-box" model is learning what you want it to learn, without understanding everything that it has learned exactly? I might not be able to actually formally verify that my neural network has learned exactly what I want it to (i.e. by extracting the knowledge out of it and comparing it to what I already know), but maybe I have formal proofs of the algorithm it is using and so I know its knowledge will be fairly robust under certain conditions. It's basically the latter we need to be aiming for.

Comment by tristanm on Security Mindset and Ordinary Paranoia · 2017-11-26T00:15:58.190Z · LW · GW

A sort of fun game that I’ve noticed myself playing lately is to try and predict the types of objections that people will give to these posts, because I think once you sort of understand the ordinary paranoid / socially modest mindset, they become much easier to predict.

For example, if I didn’t write this already, I would predict a slight possibility that someone would object to your implication that requiring special characters in passwords is unnecessary, and that all you need is high entropy. I think these types of objections could even contain some pretty good arguments (I have no idea if there are actually good arguments for it, I just think that it’s possible there are). But even if there are, it doesn’t matter, because objecting to that particular part of the dialogue is irrelevant to the core point, which is to illustrate a certain mode of thinking.

The reason this kind of objection is likely, in my view, is because it is focused on a specific object-level detail, and to a socially modest person, these kinds of errors are very likely to be observed, and to sort-of trigger an allergic reaction. In the modest mindset, it seems to be that making errors in specific details is evidence against whatever core argument you’re making that deviates from the currently mainstream viewpoint. A modest person sees these errors and thinks “If they are going to argue that they know better than the high status people, they at least better be right about pretty much everything else”.

I observed similar objections to some of your chapters in Inadequate Equilibria. For example, some people were opposed to your decision to leave out a lot of object-level details of some of the dialogues you had with people, such as the startup founders. I thought to myself “those object-level details are basically irrelevant, because these examples are just to illustrate a certain type of reasoning that doesn’t depend on the details”, but I also thought to myself “I can imagine certain people thinking I was insane for thinking those details don’t matter!” To a socially modest person, you have to make sure you’ve completely ironed-out the details before you challenge the basic assumptions.

I think a similar pattern to the one you describe above is at work here, and I suspect the point of this work is to show how the two might be connected. I think an ordinary paranoid person is making similar mistakes to a socially under-confident person. Neither will try to question their basic assumptions, because as the assumptions underlie almost all of their conclusions, to judge them as possibly incorrect is equivalent to saying that the foundational ideas the experts said in textbooks or lectures might be incorrect, which is to make yourself higher-status relative to the experts. Instead, a socially modest / ordinary paranoid person will turn that around on themselves and think “I’m just not applying principle A strongly enough” which doesn’t challenge the majority-accepted stance on principle A. To be ordinarily paranoid is to obsess over the details and execution. Social modesty is to not directly challenge the fundamental assumptions which are presided over by the high-status. The result of that is when a failure is encountered, the assumptions can’t be wrong, so it must have been a flaw in the details and execution.

The point is not to show that ordinary paranoia is wrong, or that challenging fundamental assumptions is necessarily good. Rather it’s to show that the former is basically easy and the latter is basically difficult.

Comment by tristanm on Blind Empiricism · 2017-11-13T17:49:02.888Z · LW · GW

My understanding of A/B testing is that you don't need an explicit causal model , or a "big theory" in order to successfully use it, you mostly would be using intuitions gained from experience in order to test hypotheses like "users like the red page better than the blue page", which has no explicit causal information.

Here you argue that intuitions gained from experience count as hypotheses just as much as causal theories do, and not only that, but that they tend to succeed more often than the big theories do. That depends on what you consider to be "success" I think. I agree that empirically gained intuitions probably have a lower failure rate than causal theories (you won't do much worse than average) but what Eliezer is mainly arguing is that you won't do much better than average, either.

And as far as you don't mind just doing ok on average, that might be fine, then. But the main thing this book is grappling with is "how do I know when I can do a lot better than average?" And that seems to be dependent on whether or not you have a good "big theory" available.

Comment by tristanm on Blind Empiricism · 2017-11-13T16:00:14.726Z · LW · GW

I wonder if it would have been as frustrating if he had instead opened with "The following are very loosely based on real conversations I've had, with many of the details changed or omitted." That's something many writers do and get away with, for the very reason that sometimes you want to show that someone actually thinks what you're claiming people think, but you don't actually want to be adversarial to the people involved. Maybe it's not the fairest to the specific arguments, but the alternative could quite possibly turn out worse, or cause a fairly large derail from the main point of the essay, when you start focusing on the individual details of each argument instead of whatever primary pattern you're trying to tie them together with.

Comment by tristanm on In defence of epistemic modesty · 2017-11-08T23:24:57.888Z · LW · GW

There’s a fundamental assumption your argument rests on which is a choice of prior: Assume that everyone’s credences in a given proposition is a distribution centered around the correct value an ideal agent would give to that proposition if they had access to all the information that was available and relevant to that proposition and had enough time and capacity to process that information to the fullest extent. Your arguments are sound given that the above is actually the correct prior, but I see most of your essay as arguing why modesty would be the correct stance given that the assumption is true, and less of it about why that prior is the best one to have in the vast majority of situations.

In practice, the kind of modesty we are actually interested in is the case where we belong to group A, and group A has formed some credence in proposition X through a process that involves argumentation, exchange and acquisition of information, verifying each other’s logical steps, and so on (there's some modesty going on within group A). After a while of this process going on the group consensus around X has converged on a particular value. But there is also another group, group B, which is much larger than A and has been around for longer too. Group B has converged on a very different credence for X than group A. Should group A update their credence in X to be much closer to B’s? Your argument seems to say, yes, basically they should.

I think I would tend to agree with you if the above information was all I had available to me. For all I know, my group A is no more or less effective than group B at reaching the truth, and is not subject to any different systemic biases that would prevent the correct credences from being reached. Since B has been around for longer, and has more members, possibly with experts, I should consider them to have more likely converged to the correct credence in X than group A has.

However, in the cases that we tend to really care about, the places where we think immodesty might be reasonable, group A usually does have some reason to believe that group B is less effective at converging to the correct value. It could be due to systemic issues, inadequate equilibria, Moloch, whatever you want to call it. In other words, we have some reason to think that throwing out the default prior is ok.

So this is where I think the crux of the argument is: How strongly do we expect the distribution of credences on X for group B to not be centered around the correct value? In general I think the key is that A tends to search for the X that maximizes the above statement, in other words, the X that B is most likely to be wrong about. After A thinks it’s found the best X, then it asks if immodesty is ok in this specific case. When you factor in that search process, it makes it a little more likely to actually find a bias that makes the prior wrong. I think there’s an unstated assumption in your essay that implies that X is chosen randomly with respect to the likelihood of bias.

Comment by tristanm on An Equilibrium of No Free Energy · 2017-11-01T05:25:24.610Z · LW · GW

I believe that equilibria has already arrived...and at no real surprise, since no preventive measures were ever put into place.

The reason this equilibria occurs is because there is a social norm that says "upvote if this post is both easy to understand and contains at least one new insight." If a post contains lots of deep and valuable insights, this increases the likelihood that it is complex, dense, and hard to understand. Hard to understand posts often get mistaken for poor writing (or worse, will be put in a separate class and compared against academic writing) and will face higher scrutiny. Only rarely and with much effort will someone be able to successfully write things that are easy to understand and contain deep insights. (As an example, consider MIRI's research papers, which are more likely to contain much more valuable progress towards a specific problem, but also recieve little attention, and are often compared against other academic works where they face an uphill battle to gain wider acceptance.)

The way around this, if you choose to optimize for social approval and prestige, is to write beautifully written posts that explain a relatively simple concept. Generally, it is much easier to be a brilliant writer than someone who uncovers truly original ideas. It's much easier to use this strategy with our current reward system.

Therefore, what results is basically a lot of amazingly-written articles that very clearly explain a concept you probably could have learned somewhere else.

But we're in for a real treat with this sequence, since it openly acknowledges that it's hard to know if you've found a genuine insight. It's going to get really meta...

Comment by tristanm on Leaders of Men · 2017-10-30T16:12:34.976Z · LW · GW

I think the post could also be interpreted as saying, "when you select for rare levels of super-competence in one trait, you are selecting against competence in most other traits" or at least, "when you select for strong charisma and leadership ability, you are selecting for below average management ability." It's a little ambiguous about how far this is likely to generalize or just how strongly specific skills are expected to anti-correlate.

Comment by tristanm on Inadequacy and Modesty · 2017-10-29T05:23:42.423Z · LW · GW

I think the central reason it’s possible for an individual to know something better than the solution currently prescribed by mainstream collective wisdom is that the vastness in the number of degrees of freedom in optimizing civilization guarantees that there will always be some potential solutions to problems that simply haven’t received any attention yet. The problem space is simply way, way too large to expect that even relatively easy solutions to certain problems are known yet.

While modesty may be appropriate in situations regarding a problem that is widely visible and considered urgent by society, I think even within this class of problems, there are still so many inefficiencies and non-optimalities that, if you go out looking for one, it’s likely you’ll be able to actually find one. The existence of people who actively go looking for these types of problems, like within Effective Altruism, may demonstrate this.

The stock market is a good example of a problem that is relatively narrow in scope and also receiving a huge amount of society’s collective brainpower. But I just don’t think there’s nearly enough brainpower to expect even most of the visible and urgent problems to already have adequate solutions, or to even have solutions proposed.

There may also be certain dynamics where there are trade-offs between, how much energy and effort does society have to spend in order to implement a specific solution, and how much would this subtract from the effort and energy currently needed to support the other mechanisms of civilization? This dynamic may result in the existence of problems that are easy to notice, perhaps even easy to define a solution for, but in practice immensely complex to implement.

For example, it’s within the power of individual non-experts to understand the basic causes of the Great Recession. And there may have been individuals who predicted it to occur. But it could still have been the case that, actually, it was not feasible for society to simply recognize this and change course quickly enough to avert the disaster.

But rather than for society to simply say, once a disaster becomes predictable, “yes we all know this is a problem, but we really don’t know what to do about it, or if it’s even possible to do anything about it”, the incentive structures are such that it’s easier to spend brainpower to come up with reasons why it’s not really that bad and perhaps the problem doesn’t even exist in the first place. Therefore the correct answer gets hidden away and the commonly accepted answer is incorrect.

In other words, modesty is most reasonable when the systems that support knowledge accumulation don’t filter out any correct answers.

Comment by tristanm on Continuing the discussion thread from the MTG post · 2017-10-25T13:39:42.797Z · LW · GW

I think avoiding status games is sort of like trying to reach probabilities of zero or one: Technically impossible, but you can get arbitrarily close, to the point where trying to measure the weight that status shifts are assigned within everyone's decision making is lowered to be almost non-measurable.

I'm also not sure I would define "not playing the game" as within a group, making sure that everyone's relative status is the same. This is simply a different status game, just with different objectives. It seems to me that what you suggest doing would simply open up a Pandora’s Box of undesirable epistemic issues. Personally, I want the people who consistently produce good ideas and articulate them well to have high status. And if they are doing it better than me, then I want them to have higher status than myself. I want higher status for myself too, naturally, but I channel that desire into practicing and maintaining as many characteristics that I believe aid the goals of the community. My goal is almost never to preserve egalitarian reputation at the expense of other goals, even among people I respect, since I fear that trying to elevate that goal to a high priority carries the risk of signal-boosting poor ideas and filtering out good ones. Maybe that’s not what you’re actually suggesting needs to be done, maybe your definition doesn't include things like reputation, but does consider status in the sense of who gets to be socially dominant. I think what I consider my crux is that it’s less important to make sure that “mutual respect” and “consider equal in status, to whatever extent status actually means” mean the same thing, and more important that the “market” of ideas generated by open discourse maintains a reasonable distribution of reputation.

Comment by tristanm on Continuing the discussion thread from the MTG post · 2017-10-25T07:59:13.457Z · LW · GW

I understand that there may be costs to you for continued interaction with the site, and that your primary motivations may have shifted, but I will say that your continued presence may act as a buffer that slows down the formation of an orthodoxy, and therefore you may be providing value by remaining even if the short term costs remain negative for a while.

Comment by tristanm on What Evidence Is AlphaGo Zero Re AGI Complexity? · 2017-10-22T19:29:00.834Z · LW · GW

Disagreements here are largely going to revolve around how this observation and similar ones are interpreted. This kind of evidence must push us in some direction. We all agree that what we saw was surprising - a difficult task was solved by a system with no prior knowledge or specific information to this task baked in. Surprise implies a model update. The question seems to be which model.

The debate referenced above is about the likelihood of AGI "FOOM". The Hansonian position seems to be that a FOOM is unlikely because obtaining generality across many different domains at once is unlikely. Is AlphaGo evidence for or against this position?

There is definitely room for more than one interpretation. On the one hand, AG0 did not require any human games to learn from. It was trained via a variety of methods that were not specific to Go itself. It used neural net components that were proven to work well on very different domains such as Atari. This is evidence that the components and techniques used to create a narrow AI system can also be used on a wide variety of domains.

On the other hand, it's not clear whether the "AI system" itself should be considered as only the trained neural network, or the entire apparatus including using MCTS to simulate self play in order to generate supervised training data. The network by itself plays one game, the apparatus learns to play games. You could choose to see this observation instead as "humans tinkered for years to create a narrow system that only plays Go." AG0, once trained, cannot go train on an entirely different game and then know how to play both at a superhuman level (as far as I know, anyway. There are some results that suggest it's possible for some models to learn different tasks in sequence without forgetting). So one hypothesis to update in favor of is "there is a tool that allows a system to learn to do one task, this tool can be applied to many different tasks, but only one task at a time."

But would future, more general, AI systems do something similar to human researchers, in order to train narrow AI subcomponents used for more specific tasks? Could another AI do the "tinkering" that humans do, trained via similar methods? Perhaps not with AG0's training method specifically. But maybe there are other similar, general training algorithms that could do it, and we want to know if this one method that proves to be more general than expected suggests the existence of even more general methods.

It's hard to see how this observation can be evidence against this, but there are also no good ways to determine how strongly it is for it, either. So I don't see how this can favor Hanson's position at all, but how much it favors EY's is open to debate.

Comment by tristanm on What Evidence Is AlphaGo Zero Re AGI Complexity? · 2017-10-22T16:22:19.475Z · LW · GW

The only things that are required, I believe, is that the full state of the game can be fed into the network as input, and that the action space is small enough to be represented by network output and is discrete, which allows MCTS to be used. If you can transform an arbitrary problem into this formulation then in theory the same methods can be used.

Comment by tristanm on Seek Fair Expectations of Others’ Models · 2017-10-20T23:21:28.937Z · LW · GW

I think I’m going to stake out a general disagreement position with this post, mainly because: 1) I mostly disagree with it (I am not simply being a devil’s advocate) and 2) I haven’t seen any rebuttals to it yet. Sorry if this response is too long, and I hope my tone does not sound confrontational.

When I first read Eliezer’s post, it made a lot of sense to me and seemed to match with points he’s emphasized many times in the past. I would make a general summarization of the points I’m referring to as: There have been many situations throughout history where policy makers, academics, or other authorities have made fairly strong statements about the future and actions we should collectively take without using any reliable models. This has had pretty disastrous consequences.

In regards to the example of Eliezer supposedly asking an unfair question, the context that I grabbed from his post was that this occurred during a very important summit on AI safety and policy between academics and other luminaries. This was supposed to be a conference where these influential people were actually trying to decide on specific courses of action to take, not merely a media-related press extravaganza, or some kind of informal social gathering between important people in a private context. I don’t remember if he actually states what conference it was but I’m guessing it was probably the Asilomar conference that occurred earlier this year.

If it was an informal social gathering, I think I would agree that it would be sort of unfair to ask random people tough questions like this and expect queued up answers, but as it stands, I’m fairly certain this was an important meeting that could influence the course of events for many years to come. AI safety is only just starting to become accepted in the mainstream, so whatever occurs at these events has to sort of nudge it in the right direction.

So we essentially have a few important reasons why it’s ok to be blunt here and ask tough questions to a panel of AI experts. Eliezer stood up and asked a question he probably expected not to receive a great answer to right away, and he did it in front of a bunch of luminaries who may have been embarrassed by this. So Eliezer broke a social norm because this could have been interpreted as disrespectful to these people, and he probably lowered his own status in the process. This is a risky move.

But in doing this, he forced them to display a weakness in their understanding of this specific subject. He mentioned that most of them accepted with high confidence that AGI was “really far away” (which I suppose means long enough from now that we don’t have to worry that much). So they must believe they have some model, but under more scrutiny it appears that they really don’t.

You say it’s unfair of Eliezer to expect them to have a good model, and to have a good answer queued up, but I also think it’s unfair to claim AGI is very far away without having any models to back that up. What they say on that stage probably matters a lot.

It’s technically true that the question he asked was unfair, because I am pretty sure he expected not to receive a good answer and that was why he asked it. So perhaps it was not asked purely in the spirit of intellectual discourse, it had rhetorical motivations as well. We can call that unfair if we must.

But I am also fairly certain that it was an important move to make from a consequentialist standpoint. It might have been disrespectful to the panelists, but it could have made them stop to think about it, or perhaps made others see that our understanding isn’t quite good enough to make claims about what definitely should or should not be done about AI safety.

I think he was totally considering social cues and incentive gradients when he did this, and it was precisely because of them that he did. Influential people under a lot of spotlight and public scrutiny will be more
under pressure from these things. Therefore in order to give a “nudge”, if you’re someone who also happens to have a bit of influence, you might have to call them out in public a bit. It has a negative cost to you, but in the long run it might pay off.  

I think it’s still a reasonable question whether or not this actually will pay off (will they try to more carefully consider their models of future AI development?) but I think his reasoning for doing this was pretty solid. I don’t get the impression that he’s demanding everyone has a solid model that makes predictions with hard numbers that they can query on demand, nor that he’s suggesting that we enforce negative social consequences for everyone for not having one.

Yes, you can always take into account everyone’s circumstances and incentives, but if those are generally pointing in a wrong enough direction for people who have real influence, I think it’s okay to do something about it.

Comment by tristanm on Yudkowsky on AGI ethics · 2017-10-20T18:01:07.260Z · LW · GW

It's very likely that the majority of ethical discussion in AI will become politicized and therefore develop a narrow Overton window, which won't cover the actually important technical work that needs to be done.

The way that I see this happening currently is that ethics discussions have come to largely surround two issues: 1) Whether the AI system "works" at all, even in the mundane sense (could software bugs cause catastrophic outcomes?) and 2) Is it being used to do things we consider good?

The first one is largely just a question of implementing fairly standard testing protocols and developing refinements to existing systems, which is more along the lines of how to prevent errors in narrow AI systems. The same question can be applied to any software system at all, regardless of its status as actually being an "AI". In AGI ethics you pretty much assume that lack of capability is not the issue.

The second question is much more likely to have political aspects, this could include things like "Is our ML system biased against this demographic?" or "Are governments using it to spy on people?" or "Are large corporations becoming incredibly wealthy because of AI and therefore creating more inequality?" I also see this question as applying to any technology whatsoever and not really specifically about AI. The same things could be asked about statistical models, cameras, or factories. Therefore, much of our current and near-future "AI ethics" discussions will take on a similar tack to historical discussions about the ethics of some new technology of the era, like more powerful weapons, nuclear power, faster communications, the spread of new media forms, genetic engineering and so on. I don't see these discussions as even pertaining to AGI risk in a proper sense, which should be considered in its own class, but they are likely to be conflated with it. Insofar as people generally do not have concrete "data" in front of them detailing exactly how and why something can go wrong, these discussions will probably not have favorable results.

With nuclear weapons, there was some actual "data" available, and that may have been enough to move the Overton window in the right direction, but with AGI there is practically no way of obtaining that with a large enough time window to allow society to implement the correct response.

Comment by tristanm on On the construction of beacons · 2017-10-17T23:11:04.705Z · LW · GW

The interesting question to me is, are sociopaths ever useful to have or are they inherently destructive and malign? You used the analogy of an anglerfish, and I don't think there are many better comparisons you would want to make if your goal is to show something at the furthest (negative) edge of what we consider aesthetically pleasing - one of the most obviously hostile-looking creatures on the planet. To me that certainly seems intentional.

There are sort of three definitions of "sociopath" that get used here, and they often overlap, or perhaps are sort of assumed to be the same. One is the traditional definition of sociopath - someone who basically lacks morals and empathy - and the others are some combination of the Gervais Principle Sociopath and the Chapman Sociopath. The Gervais sociopath seems to be someone who actually is capable of doing good things sometimes, because they are the only ones with both the creative vision and the charisma to organize and convince a bunch of people to do stuff, but they often lie and cheat and create delusions in order to get there. The Chapman kind is similar, but is also someone who comes into social groups just to prey on people and feed their own ego or whatever it is they really want. And they usually cause destruction, which is different from the kind of sociopaths that create value to society or an economy or something like that.

But with respect to community building, or Effective Altruism, the question is, given that your group will invariably eventually become composed of various personality types, some geeks, MOPs, and sociopaths, is your primary goal to filter this pool down - let's say by making it hard for sociopaths to find their way in - or just by making it so that everyone's objectives are in some way aligned with the overarching goals, without removing anyone?

By the way, this question does not at all apply to your choice of how to write your posts. I think if you want to write in a way that acts as a high-fidelity signal but not be the brightest beacon possible, that makes perfect sense for personal writings.

Comment by tristanm on On the construction of beacons · 2017-10-16T22:13:12.398Z · LW · GW

Would you mind fleshing this out a bit more? I feel like when you say "overrate Ra" this could be meant in more than one sense - i.e., to overvalue social norms or institutions in general, or, in the specific sense of this discussion, to regard sociopaths as having more inherent worth to an institution or group than they truly have.

Comment by tristanm on On the construction of beacons · 2017-10-16T19:07:03.681Z · LW · GW

The geeks, ideally, prefer to keep their beacon emitting on a very narrow, specific frequency band only – they’d prefer to keep all others out besides genuine geeks, and therefore their signal will need to be practically invisible to everyone else. This is kind of how I imagine the early proto-rationalist subcultures, you basically have to know exactly where to look to be able to find them, and already possess a large fraction of the required background knowledge.

It would have continued like that, if it wasn’t for the fact that they eventually needed people with the skills to attract attention, resources, and who had the charisma to motivate a lot of work being done. In other words, they needed sociopaths – they didn’t need MOPS so much at first, but they probably thought it would be easier to attract the sociopaths by attracting lots of MOPs first. The sociopaths feed off of the attention and approval of MOPs, therefore you need the MOPs to get the sociopaths. It is unlikely the sociopaths will be attracted to the original, pure-geek subculture at first.

But I think the key here is that there are no good reasons for the geeks to start emitting on a wider frequency band unless they desperately need help. But generally, when they first start to do this, it is sort of done haphazardly and without real knowledge of the full consequences. That’s not to say it’s ever done with negligence, it’s just usually extremely difficult to make good predictions about the full effects, since we’re operating at much higher complexity here.

I don’t see a lot of great solutions around this because, as you mention, the sociopaths will invariably seek control. And if you don’t give them control, they will leave, or worse, demolish your reputation, so that you aren’t able to seek support from anyone ever again.

It seems only reasonable then to introduce unbreakable rules. The downside of course is that lots of rationalists are against this sort of thing by their natures - even I feel weird suggesting it - and they prefer the freedom from social norms and expectations, and are more partial to consequentialist ways of behavior. MOPs, however, probably operate more comfortably within rule-systems, which forces the sociopaths to abide by them as well, since it’s much easier for the MOPs to notice and react when a rule has been broken.

You can’t include a rule like “The most trustworthy person is so-and-so” because then the sociopaths will simply try to tarnish that person’s reputation. And you can’t include a rule like “All of our content-creation and decision-making must be accomplished anonymously”, because even though it prevents sociopaths from taking control, they gain nothing from membership in the group. The sociopaths must be able to gain what they want from membership in the group, but they also have to be beholden to the operating protocols of said group without being able to manipulate those protocols. The only way to do that, I think, is to make the protocols very explicit and clear, and have mechanisms that make it very obvious when they are being broken.

Comment by tristanm on What is the pipeline? · 2017-10-06T16:52:18.329Z · LW · GW

To be clear, I do not believe that trying to create such a conspiracy is feasible, and wanted to emphasize that even if it were possible, you'd still need to have a bunch of other problems already solved (like making an ideal truth-seeking community). Sometimes it seems that rationalists want to have an organization that accomplishes the maximum utilitarian good, and hypothetically, this implies that some kind of conspiracy - if you wish to call it that - would need to exist. For a massively influential and secretive conspiracy, I might assign a < 1% change of one already existing (in which case it would be too powerful to overcome) and a greater than 99% of none existing (in which case it's probably impossible to succeed in creating one).

That said, to solve even just the highest priority issues of interest to EAs, which probably won't require a massively influential and secretive conspiracy, I think you'd still need to solve the problem of alignment of large organizations with these objectives, especially for things like AI, where development and deployment of such will mainly be accomplished by the most enormous and wealthy firms. These are the kind of organizations that can't be seeded to have good intentions from the start. But it seems like you'd still want to have some influence over them in some way.

Comment by tristanm on What is the pipeline? · 2017-10-05T23:24:30.172Z · LW · GW

Let’s suppose we solve the problem of building a truth-seeking community that knows and discovers lots of important things, especially the answers to deep philosophical questions. And more importantly, let's say the incentives of this group were correctly aligned with human values. It would be nice to have a permanent group of people that act as sort of a cognitive engine, dedicated to making sure that all of our efforts stayed on the right track and couldn’t be influenced by outside societal forces, public opinion, political pressure, etc. Like some sort of philosophical query machine that the people who are actually in power, or have influence and a public persona, would have to actually follow directives from – or at least, would face heavy costs if they began to do things against the wishes of this group.

This is sort of like the First versus Second Foundation. The First had all the manpower, finances, technology, and military strength, but the Second made sure everything happened according to the plan. And the First was destined to become corrupt and malign anyway, as this would happen with any large and unwieldy organization that gains too much power.

The problem of course is that the Second Foundation used manipulation, mind control, and outright treachery to influence events. So how would we structure incentives so that our larger and more influential organizations actually have to follow certain directives, especially ones that could possibly change rapidly over time?

Politically this can sometimes be accomplished through democracy, or the threat of revolt, but this never gets us very close to an ideal system. Economically, this can sometimes be accomplished by consumer choice, but when an organization forms a legally-sanctioned monopoly or sometimes becomes too far separated from the consumer, then there is no way to keep the organization aligned (see Equifax).

This is even a problem with Effective Altruist organizations, because even though there are still options for most philanthropists, the main thing most philanthropic organizations seek are donations from people with very high net-worth, so they will mainly become influenced by the wants of those individuals, to the extent that public opinion does not matter.

And to the extent that public opinion does matter, these organizations will have to ensure that they never propose any actions too far outside of the window of social acceptability, and when they do choose to take small steps outside of this window, they may have to partially conceal or limit the transparency of these actions.

And all this does have tangible effects on which projects actually get completed, which things get funded and so on, because we absolutely do need lots of resources to accomplish good things in the world, and the people with the most control over these resources also tend to be the most visible and already tied to lots of different incentive structures that we have almost no ability to override.

I know that LW has managed to seed some people into these organizations so that they are at least exposed to these ideas and so-on, and I know that this has had some pretty positive effects, but I also am somewhat skeptical that this will be enough as EA orgs grow and become more mainstream than they are. Every large organization must move towards greater bureaucracy and greater inertia as they grow, and if they become misaligned it becomes very difficult for them to change course. Correctly seeding them seems to be the best strategy but beyond that it is an unsolved problem.

Comment by tristanm on Avoiding Selection Bias · 2017-10-05T19:43:55.575Z · LW · GW

But a thing that strikes me here is this: the LW community really doesn't, so far as I can tell, have the reputation of being a place where ideas don't get frank enough criticism.

But the character of LW definitely has changed. In the early days when it was gaining its reputation, it seemed to be a place where you could often find lots of highly intense, vociferous debate over various philosophical topics. I feel that nowadays a lot of that character is gone. The community seems to have embraced a much softer and more inclusive discourse style, trying to minimize the overall amount of offense and anxiety one could feel when they are trying to engage here. And indeed, a large amount of the discussion topics these days are surrounding social norms and community norms, which is a pretty big difference from the original LW culture.

I'm not completely sure how I feel about this. On the one hand, we seem to have created a place that seems much more approachable (and the changes might probably be a factor in why I began to take part in the discussions somewhat recently), and less intimidating to newcomers. So I think that part of it is good. On the other hand, some of the best thinkers were people who had a pretty aggressive conversational style (such as EY himself). I think these people have mostly switched over to private discussions among people they trust with limited public engagement. It's unclear how much we are missing from them.

There are obvious situations where you would desire one style of discourse versus the other: In situations where you need to gather support from many groups with potentially competing interests and complex needs, you would definitely prefer to be as inclusive and inoffensive as possible, while limiting discussion to matters that relate to everyone. In the situation which amounts to casual debate among trusted people, it becomes a lot safer to be as frank and transparent as possible.

LW began very organically as conversations between many people who already interacted quite frequently sort of merged into a single locus. That meant that lively and open debate could happen somewhat safely at least initially, but that changed as soon as the space became more known outside of its bubble and grew.

There's nothing that can be done to prevent that from happening, I think, but one thing that can be done is to siphon aggressive conversations into separate locations, away from the front-facing side but still possible to enter if you're prepared.

Comment by tristanm on Tensions in Truthseeking · 2017-10-02T23:40:54.086Z · LW · GW

When you talk about “truthseekers”, do you mean someone who is interested in discovering as many truths as possible for themselves, or someone who seeks to add knowledge to the collective knowledge base?

If it’s the latter, then the rationalsphere might not be so easy to differentiate from say, academia, but if it’s the former, that actually seems to better match with what I typically observe about people’s goals within the rationalsphere.

But usually, when someone is motivated to seek truths for themselves, the “truth” isn’t really an end in and of itself, but it’s really the pleasure associated with gaining insight that they’re after. This is highly dependent on how those insights are acquired. Did someone point them out to you, or did you figure it out by yourself?

I often feel that the majority of the pleasure of truthseeking is gained from self-discovery, i.e., “the joy of figuring things out.” This can be a great motivating force for scientists and intellectuals alike.

But the main issue with this is that it’s not necessarily aligned with other instrumental goals. Within academia, there are at least incentive structures that point truthseekers in the direction of things that have yet to be discovered. In the rationalsphere, I have no idea if those incentive structures even exist, or if there are many incentive structures at all (besides gaining respect from your peers). This seems to leave us open to treading old ground quite often, or not making progress quickly enough in the things that are important.

I definitely sympathize with the terminal-truthseekers (I spend a great deal of time learning stuff that doesn’t necessarily help me towards my other goals), but I also think that in regard to this effort of community building we should keep this in mind. Do we want to build a community for the individual or to build the individual for the community?