Comment by kaj_sotala on Some Thoughts on My Psychiatry Practice · 2019-01-18T14:41:05.726Z · score: 4 (2 votes) · LW · GW

I'd expect the response to be something along the lines of: "But I already know I'm not one! I'm [list of reasons for why this is purportedly the case]."

Comment by kaj_sotala on Book Summary: Consciousness and the Brain · 2019-01-18T13:37:10.598Z · score: 12 (3 votes) · LW · GW

Dehaene discusses the "flash of insight" example a bit in the section on unconscious processing. I think the general consensus there is that although solutions can be processed unconsciously, this only works after you've spent some time thinking about them consciously first. It might be something like, you get an initial understanding during the initial conscious information-sharing. Then when the relevant brain systems have received the information they need to process, they can continue crunching the data unconsciously until they have something to present to the rest of the system.

[The mathematician] Hadamard deconstructed the process of mathematical discovery into four successive stages: initiation, incubation, illumination, and verification. Initiation covers all the preparatory work, the deliberate conscious exploration of a problem. This frontal attack, unfortunately, often remains fruitless—but all may not be lost, for it launches the unconscious mind on a quest. The incubation phase—an invisible brewing period during which the mind remains vaguely preoccupied with the problem but shows no conscious sign of working hard on it—can start. Incubation would remain undetected, were it not for its effects. Suddenly, after a good night’s sleep or a relaxing walk, illumination occurs: the solution appears in all its glory and invades the mathematician’s conscious mind. More often than not, it is correct. However, a slow and effortful process of conscious verification is nevertheless required to nail all the details down. [...]
... an experiment by Ap Dijksterhuis comes closer to Hadamard’s taxonomy and suggests that genuine problem solving may indeed benefit from an unconscious incubation period. The Dutch psychologist presented students with a problem in which they were to choose from among four brands of cars, which differed by up to twelve features. The participants read the problem, then half of them were allowed to consciously think about what their choice would be for four minutes; the other half were distracted for the same amount of time (by solving anagrams). Finally, both groups made their choice. Surprisingly, the distracted group picked the best car much more often than the conscious-deliberation group (60 percent versus 22 percent, a remarkably large effect given that choosing at random would result in 25 percent success). The work was replicated in several real-life situations, such as shopping at IKEA: several weeks after a trip there, shoppers who reported putting a lot of conscious effort into their decision were less satisfied with their purchases than the buyers who chose impulsively, without much conscious reflection. 
Although this experiment does not quite meet the stringent criteria for a fully unconscious experience (because distraction does not fully ensure that the subjects never thought about the problem), it is very suggestive: some aspects of problem solving are better dealt with at the fringes of unconsciousness rather than with a full-blown conscious effort. We are not entirely wrong when we think that sleeping on a problem or letting our mind wander in the shower can produce brilliant insights.

I'm not sure why you say that the unconscious modules communicating with each other would necessarily contradict the idea of us being conscious of exactly the stuff that's in the workspace, but I tend to agree that considering the contents of our consciousness and the contents of the workspace to be strictly isomorphic seems to be too strong. I didn't go into that because this post was quite long already. But my own experience is that something like Focusing or IFS tends to create things such as weird visualizations that make you go "WTF was that" - and afterwards it feels like something has definitely shifted on an emotional level. Getting various emotional issues into consciousness feels like it brings them into a focus in a way that lets the system re-process them and may e.g. purge old traumas which are no longer relevant - but the parts of the process that are experienced consciously are clearly just the tip of the iceberg, with most of the stuff happening "under the hood".

This paper also argues something that feels related: Dehaene notes that when we see a chair, we don't just see the raw sensory data, but rather some sensory data and the concept of a chair, suggesting that the concept of a chair is in the GNW. But what is "a concept of a chair"? The paper argues that according to Dehaene, we have something like the concept of a chair in our consciousness / the GNW, but that this a problem for Dehaene's theory because we are never actually aware of an entire concept. Concepts generalize over a broader category, but we are only ever aware of individual instances of that category.

The primary function of concepts [...] is to abstract away [...] so that certain aspects of experiences can be regarded as instances of more wide-ranging, similarity-based categories. In fact, according to the conservative view, it is precisely because concepts always transcend the experiences they apply to that they always remain unconscious. Both Prinz (2012) and Jackendoff (2012) underscore this point:
When I look at a chair, try as I may, I only see a specific chair oriented in a particular way. … it's not clear what it would mean to say that one visually experiences chairness. What kind of experience would that be? A chair seen from no vantage point? A chair from multiple vantage points overlapping? A shape possessed by all chairs? Phenomenologically, these options seem extremely implausible. (Prinz, 2012, p. 74)
Now the interesting thing is that everything you perceive is a particular individual (a token)—you can't perceive categories (types). And you can only imagine particular individuals—you can't imagine categories. If you try to imagine a type, say forks in general, your image is still a particular fork, a particular token. (Jackendoff, 2012, p. 130)
Dehaene (2014, p. 110) expands on the notion that consciousness is like a summary of relevant information by stating that it includes “a multisensory, viewer-invariant, and durable synthesis of the environment.” But neither visual awareness nor any other form of experience contains viewer-invariant representations; on the contrary, possessing a first-person perspective—one that, for sighted people, is typically anchored behind the eyes—is often taken to be a fundamental requirement of bodily self-consciousness (Blanke and Metzinger, 2009). This is quite pertinent to the main topic of this article because, according to the conservative view, one of the reasons why concepts cannot reach awareness is because they always generalize over particular perspectives. This key insight is nicely captured by Prinz (2012, p. 74) in the passage quoted earlier, where he makes what is essentially the following argument: the concept of a chair is viewer-invariant, which is to say that it covers all possible vantage points; however, it is impossible to see or imagine a chair “from no vantage point” or “from multiple vantage points overlapping”; therefore, it is impossible to directly experience the concept of a chair, that is, “chairness” in the most general sense.
In another part of his book, Dehaene (2014, pp. 177–78) uses the example of Leonardo da Vinci's Mona Lisa to illustrate his idea that a conscious state is underpinned by millions of widely distributed neurons that represent different facets of the experience and that are functionally integrated through bidirectional, rapidly reverberating signals. Most importantly for present purposes, he claims that when we look at the classic painting, our global workspace of awareness includes not just its visual properties (e.g., the hands, eyes, and “Cheshire cat smile”), but also “fragments of meaning,” “a connection to our memories of Leonardo's genius,” and “a single coherent interpretation,” which he characterizes as “a seductive Italian woman.” This part of the book clearly reveals Dehaene's endorsement of the liberal view that concepts are among the kinds of information that can reach consciousness. The problem, however, is that he does not explicitly defend this position against the opposite conservative view, which denies that we can directly experience complex semantic structures like the one expressed by the phrase “a seductive Italian woman.” The meaning of the word seductive, for instance, is highly abstract, since it applies not only to the nature of Mona Lisa's smile, but also to countless other visual and non-visual stimuli that satisfy the conceptual criteria of, to quote from Webster's dictionary, “having tempting qualities.” On the one hand, it is reasonable to suppose that there is something it is inimitably like, phenomenologically speaking, to perceive particular instances of seductive stimuli, such as Mona Lisa's smile. But on the other hand, it is extremely hard to imagine how anyone could directly experience seductiveness in some sort of general, all-encompassing sense.

Which is an interesting point, in that on the other hand, before I read that it felt clear to me that if I e.g. look at my laptop, I see "my laptop"... but now that I read this and introspect on my experience of seeing my laptop, there's nothing that would make my mind spontaneously go "that's my laptop", rather the name of the object is something that's available for me if I explicitly query it, but it's not around otherwise.

Which would seem to contradict (one reading of) Dehaene's model - mainly the claim that when we see a laptop, the general concept of the laptop is somehow being passed around in the workspace in its entirety. My best guess so far would be to say that what gets passed around in our consciousness is something like a "pointer" (in a loose metaphoric sense, not in the sense of a literal computer science pointer) to a general concept, which different brain systems can then retrieve and synchronize around in the background. And they might be doing all kind of not-consciously-experienced joint processing of that concept that's being pointed to, either on a level of workspace communication that isn't consciously available, or through some other communication channel entirely.

There's also been some work going under the name of the heterogeneity hypothesis of concepts, suggesting that the brain doesn't have any such thing as "the concept of a chair" or "the concept of a laptop". Rather there are many different brain systems that store information in different formats for different purposes, and while many of them might have data structures that are pointing to the same real-life thing, those structures are all quite different and not mutually compatible and describing different aspects of the thing. So maybe there isn't a single "laptop" concept being passed around, but rather just some symbol which tells each subsystem to retrieve their own equivalent of "laptop" and do... something... with it.

I dunno, I'm just speculating wildly. :)

Comment by kaj_sotala on Sequence introduction: non-agent and multiagent models of mind · 2019-01-17T10:02:43.736Z · score: 3 (1 votes) · LW · GW

Sounds like our posts could be nicely complementary, I encourage you to continue posting yours! And huh, you scooped me on the "agenthood is a leaky abstraction" idea, I didn't realize it had been previously used on LW. :)

Comment by kaj_sotala on Book Summary: Consciousness and the Brain · 2019-01-16T17:00:58.590Z · score: 3 (1 votes) · LW · GW

"Cognitive psychology is the field that consists of experimental cognitive psychology, cognitive neuroscience, cognitive neuropsychology, and computational cognitive science" was the breakdown used in my cognitive psychology textbook (relatively influential, cited 3651 times according to Google Scholar). There's also substantial overlap in the experimental setups: as in many of the experiments mentioned in the post, lots of cognitive neuroscience experiments are such that even if you removed the brain imaging part, the behavioral component of it could still pass on its own as an experimental cognitive psychology finding. Similarly, the book cites a combination of neuroimaging and behavioral results in order to build up its theory; many of the priming experiments that I discuss, also show up in that list of replicated cognitive psychology experiments.

Re: the voodoo correlations paper - I haven't read it myself, but my understanding from online discussion is that the main error that it discusses apparently only modestly overstates the strength of some correlations; it doesn’t actually cause entirely spurious correlations to be reported. The paper also separately discusses another error which was more serious, but only names a single paper which was guilty of that error, which isn't very damning. So I see the paper mostly as an indication of the field being self-correcting, with flaws in its methodologies being pointed out and then improved upon.

Book Summary: Consciousness and the Brain

2019-01-16T14:43:59.202Z · score: 60 (16 votes)
Comment by kaj_sotala on Open Thread January 2019 · 2019-01-15T08:35:58.653Z · score: 23 (8 votes) · LW · GW

Relevant excerpt for why exactly it was rejected:

The standards for deserving publication in academic philosophy are relatively simple and self-explanatory. A paper should make a significant point, it should be clearly written, it should correctly position itself in the existing literature, and it should support its main claims by coherent arguments. The paper I read sadly fell short on all these points, except the first. (It does make a significant point.) [...]
I still think the paper could probably have been published after a few rounds of major revisions. But I also understand that the editors decided to reject it. Highly ranked philosophy journals have acceptance rates of under 5%. So almost everything gets rejected. This one got rejected not because Yudkowsky and Soares are outsiders or because the paper fails to conform to obscure standards of academic philosophy, but mainly because the presentation is not nearly as clear and accurate as it could be.

So apparently the short version of "why their account has a hard time gaining traction in academic philosophy" is (according to this author) just "the paper's presentation and argumentation aren't good enough for the top philosophy journals".

Comment by kaj_sotala on AlphaGo Zero and capability amplification · 2019-01-10T09:17:15.071Z · score: 5 (2 votes) · LW · GW
In the simplest form of iterated capability amplification, we train one function:
A “weak” policy A, which is trained to predict what the agent will eventually decide to do in a given situation.
Just like AlphaGo doesn’t use the prior p directly to pick moves, we don’t use the weak policy A directly to pick actions. Instead, we use a capability amplification scheme: we call A many times in order to produce more intelligent judgments. We train A to bypass this expensive amplification process and directly make intelligent decisions. As A improves, the amplified policy becomes more powerful, and A chases this moving target.

This is totally wild speculation, but the thought occurred to me whether the human brain might be doing something like this with identities and social roles:

A lot of (but not all) people get a strong hit of this when they go back to visit their family. If you move away and then make new friends and sort of become a new person (!), you might at first think this is just who you are now. But then you visit your parents… and suddenly you feel and act a lot like you did before you moved away. You might even try to hold onto this “new you” with them… and they might respond to what they see as strange behavior by trying to nudge you into acting “normal”: ignoring surprising things you say, changing the topic to something familiar, starting an old fight, etc. [...]
For instance, the stereotypical story of the worried nagging wife confronting the emotionally distant husband as he comes home really late from work… is actually a pretty good caricature of a script that lots of couples play out, as long as you know to ignore the gender and class assumptions embedded in it.
But it’s hard to sort this out without just enacting our scripts. The version of you that would be thinking about it is your character, which (in this framework) can accurately understand its own role only if it has enough slack to become genre-savvy within the web; otherwise it just keeps playing out its role. In the husband/wife script mentioned above, there’s a tendency for the “wife” to get excited when “she” learns about the relationship script, because it looks to “her” like it suggests how to save the relationship — which is “her” enacting “her” role. This often aggravates the fears of the “husband”, causing “him” to pull away and act dismissive of the script’s relevance (which is “his” role), driving “her” to insist that they just need to talk about this… which is the same pattern they were in before. They try to become genre-savvy, but there (usually) just isn’t enough slack between them, so the effort merely changes the topic while they play out their usual scene.

If you squint, you could kind of interpret this kind of a dynamic to be a result of the human brain trying to predict what it expects itself to do next, using that prediction to guide the search of next actions, and then ending up with next actions that have a strong structural resemblance to its previous ones. (Though I can also think of maybe better-fitting models of this too; still, seemed worth throwing out.)

Comment by kaj_sotala on Book Review: The Structure Of Scientific Revolutions · 2019-01-09T12:31:28.626Z · score: 6 (2 votes) · LW · GW
I think Kuhn’s answer is that facts cannot be paradigm-independent.

I liked this take on it, from Steven Horst’s Cognitive Pluralism, where he’s quoting some of Kuhn’s later writing (the italics are Horst quoting Kuhn):

A historian reading an out-of-date scientific text characteristically encounters passages that make no sense. That is an experience I have had repeatedly whether my subject is an Aristotle, a Newton, a Volta, a Bohr, or a Planck. It has been standard to ignore such passages or to dismiss them as products of error, ignorance, or superstition, and that response is occasionally appropriate. More often, however, sympathetic contemplation of the troublesome passages suggests a different diagnosis. The apparent textual anomalies are artifacts, products of misreading.
For lack of an alternative, the historian has been understanding words and phrases in the text as he or she would if they had occurred in contemporary discourse. Through much of the text that way of reading proceeds without difficulty; most terms in the historian’s vocabulary are still used as they were by the author of the text. But some sets of interrelated terms are not, and it is [the] failure to isolate those terms and to discover how they were used that has permitted the passages in question to seem anomalous. Apparent anomaly is thus ordinarily evidence of the need for local adjustment of the lexicon, and it often provides clues to the nature of that adjustment as well. An important clue to problems in reading Aristotle’s physics is provided by the discovery that the term translated ‘motion’ in his text refers not simply to change of position but to all changes characterized by two end points. Similar difficulties in reading Planck’s early papers begin to dissolve with the discovery that, for Planck before 1907, ‘the energy element hv’ referred, not to a physically indivisible atom of energy (later to be called ‘the energy quantum’) but to a mental subdivision of the energy continuum, any point on which could be physically occupied.
These examples all turn out to involve more than mere changes in the use of terms, thus illustrating what I had in mind years ago when speaking of the “incommensurability” of successive scientific theories. In its original mathematical use ‘incommensurability’ meant “no common measure,” for example of the hypotenuse and side of an isosceles right triangle. Applied to a pair of theories in the same historical line, the term meant that there was no common language into which both could be fully translated. (Kuhn 1989/2000, 9–10)
While scientific theories employ terms used more generally in ordinary language, and the same term may appear in multiple theories, key theoretical terminology is proprietary to the theory and cannot be understood apart from it. To learn a new theory, one must master the terminology as a whole: “Many of the referring terms of at least scientific languages cannot be acquired or defined one at a time but must instead be learned in clusters” (Kuhn 1983/2000, 211). And as the meanings of the terms and the connections between them differ from theory to theory, a statement from one theory may literally be nonsensical in the framework of another. The Newtonian notions of absolute space and of mass that is independent of velocity, for example, are nonsensical within the context of relativistic mechanics. The different theoretical vocabularies are also tied to different theoretical taxonomies of objects. Ptolemy’s theory classified the sun as a planet, defined as something that orbits the Earth, whereas Copernicus’s theory classified the sun as a star and planets as things that orbit stars, hence making the Earth a planet. Moreover, not only does the classificatory vocabulary of a theory come as an ensemble—with different elements in nonoverlapping contrast classes—but it is also interdefined with the laws of the theory. The tight constitutive interconnections within scientific theories between terms and other terms, and between terms and laws, have the important consequence that any change in terms or laws ramifies to constitute changes in meanings of terms and the law or laws involved with the theory [...]
While Kuhn’s initial interest was in revolutionary changes in theories about what is in a broader sense a single phenomenon (e.g., changes in theories of gravitation, thermodynamics, or astronomy), he later came to realize that similar considerations could be applied to differences in uses of theoretical terms between contemporary subdisciplines in a science (1983/2000, 238). And while he continued to favor a linguistic analogy for talking about conceptual change and incommensurability, he moved from speaking about moving between theories as “translation” to a “bilingualism” that afforded multiple resources for understanding the world—a change that is particularly important when considering differences in terms as used in different subdisciplines.

Sequence introduction: non-agent and multiagent models of mind

2019-01-07T14:12:30.297Z · score: 74 (26 votes)
Comment by kaj_sotala on Two More Decision Theory Problems for Humans · 2019-01-05T16:38:56.078Z · score: 5 (2 votes) · LW · GW

Valentine's The Art of Grieving Well, perhaps?

I’d like to suggest that grieving is how we experience the process of a very, very deep part of our psyches becoming familiar with a painful truth. It doesn’t happen only when someone dies. For instance, people go through a very similar process when mourning the loss of a romantic relationship, or when struck with an injury or illness that takes away something they hold dear (e.g., quadriplegia). I think we even see smaller versions of it when people break a precious and sentimental object, or when they fail to get a job or into a school they had really hoped for, or even sometimes when getting rid of a piece of clothing they’ve had for a few years.

In general, I think familiarization looks like tracing over all the facets of the thing in question until we intuitively expect what we find. I’m particularly fond of the example of arriving in a city for the first time: At first all I know is the part of the street right in front of where I’m staying. Then, as I wander around, I start to notice a few places I want to remember: the train station, a nice coffee shop, etc. After a while of exploring different alleyways, I might make a few connections and notice that the coffee shop is actually just around the corner from that nice restaurant I went to on my second night there. Eventually the city (or at least those parts of it) start to feel smaller to me, like the distances between familiar locations are shorter than I had first thought, and the areas I can easily think of now include several blocks rather than just parts of streets.

I’m under the impression that grief is doing a similar kind of rehearsal, but specifically of pain. When we lose someone or something precious to us, it hurts, and we have to practice anticipating the lack of the preciousness where it had been before. We have to familiarize ourselves with the absence.

When I watch myself grieve, I typically don’t find myself just thinking “This person is gone.” Instead, my grief wants me to call up specific images of recurring events — holding the person while watching a show, texting them a funny picture & getting a smiley back, etc. — and then add to that image a feeling of pain that might say “…and that will never happen again.” My mind goes to the feeling of wanting to watch a show with that person and remembering they’re not there, or knowing that if I send a text they’ll never see it and won’t ever respond. My mind seems to want to rehearse the pain that will happen, until it becomes familiar and known and eventually a little smaller.

I think grieving is how we experience the process of changing our emotional sense of what’s true to something worse than where we started.

Unfortunately, that can feel on the inside a little like moving to the worse world, rather than recognizing that we’re already here.

Comment by kaj_sotala on Will humans build goal-directed agents? · 2019-01-05T16:20:53.992Z · score: 9 (4 votes) · LW · GW

Humans want to build powerful AI systems in order to help them achieve their goals -- it seems quite clear that humans are at least partially goal-directed. As a result, it seems natural that they would build AI systems that are also goal-directed.

This is really an argument that the system comprising the human and AI agent should be directed towards some goal. The AI agent by itself need not be goal-directed as long as we get goal-directed behavior when combined with a human operator. However, in the situation where the AI agent is much more intelligent than the human, it is probably best to delegate most or all decisions to the agent, and so the agent could still look mostly goal-directed.

Even so, you could imagine that even the small part of the work that the human continues to do allows the agent to not be goal-directed, especially over long horizons.

An additional issue is that if you have a competitive situation, there may be an incentive to minimize the amount of human involvement in the system, in order to speed up response time and avoid losing ground to competitors. I discussed this a bit in Disjunctive Scenarios of Catastrophic AI Risk:

... the U.S. military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop” (Wallach & Allen 2013). In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong. While this would allow the system to react faster, it would also limit the window that the human operators have for overriding any mistakes that the system makes. For a number of military systems, such as automatic weapons defense systems designed to shoot down incoming missiles and rockets, the extent of human oversight is already limited to accepting or overriding a computer’s plan of actions in a matter of seconds, which may be too little to make a meaningful decision in practice (Human Rights Watch 2012).

Sparrow (2016) reviews three major reasons which incentivize major governments to move toward autonomous weapon systems and reduce human control:

  1. Currently existing remotely piloted military “drones,” such as the U.S. Predator and Reaper, require a high amount of communications bandwidth. This limits the amount of drones that can be fielded at once, and makes them dependent on communications satellites which not every nation has, and which can be jammed or targeted by enemies. A need to be in constant communication with remote operators also makes it impossible to create drone submarines, which need to maintain a communications blackout before and during combat. Making the drones autonomous and capable of acting without human supervision would avoid all of these problems.
  2. Particularly in air-to-air combat, victory may depend on making very quick decisions. Current air combat is already pushing against the limits of what the human nervous system can handle: further progress may be dependent on removing humans from the loop entirely.
  3. Much of the routine operation of drones is very monotonous and boring, which is a major contributor to accidents. The training expenses, salaries, and other benefits of the drone operators are also major expenses for the militaries employing them.

Sparrow’s arguments are specific to the military domain, but they demonstrate the argument that “any broad domain involving high stakes, adversarial decision making, and a need to act rapidly is likely to become increasingly dominated by autonomous systems” (Sotala & Yampolskiy 2015, p. 18). Similar arguments can be made in the business domain: eliminating human employees to reduce costs from mistakes and salaries is something that companies would also be incentivized to do, and making a profit in the field of high-frequency trading already depends on outperforming other traders by fractions of a second. While the currently existing AI systems are not powerful enough to cause global catastrophe, incentives such as these might drive an upgrading of their capabilities that eventually brought them to that point.

In the absence of sufficient regulation, there could be a “race to the bottom of human control” where state or business actors competed to reduce human control and increased the autonomy of their AI systems to obtain an edge over their competitors (see also Armstrong et al. 2016 for a simplified “race to the precipice” scenario). This would be analogous to the “race to the bottom” in current politics, where government actors compete to deregulate or to lower taxes in order to retain or attract businesses.

AI systems being given more power and autonomy might be limited by the fact that doing this poses large risks for the actor if the AI malfunctions. In business, this limits the extent to which major, established companies might adopt AI-based control, but incentivizes startups to try to invest in autonomous AI in order to outcompete the established players. In the field of algorithmic trading, AI systems are currently trusted with enormous sums of money despite the potential to make corresponding losses—in 2012, Knight Capital lost $440 million due to a glitch in their trading software (Popper 2012, Securities and Exchange Commission 2013). This suggests that even if a malfunctioning AI could potentially cause major risks, some companies will still be inclined to invest in placing their business under autonomous AI control if the potential profit is large enough.

U.S. law already allows for the possibility of AIs being conferred a legal personality, by putting them in charge of a limited liability company. A human may register a limited liability corporation (LLC), enter into an operating agreement specifying that the LLC will take actions as determined by the AI, and then withdraw from the LLC (Bayern 2015). The result is an autonomously acting legal personality with no human supervision or control. AI-controlled companies can also be created in various non-U.S. jurisdictions; restrictions such as ones forbidding corporations from having no owners can largely be circumvented by tricks such as having networks of corporations that own each other (LoPucki 2017). A possible start-up strategy would be for someone to develop a number of AI systems, give them some initial endowment of resources, and then set them off in control of their own corporations. This would risk only the initial resources, while promising whatever profits the corporation might earn if successful. To the extent that AI-controlled companies were successful in undermining more established companies, they would pressure those companies to transfer control to autonomous AI systems as well.

Comment by kaj_sotala on 2018 AI Alignment Literature Review and Charity Comparison · 2019-01-04T14:45:30.583Z · score: 6 (2 votes) · LW · GW

I didn't look at that particular paper, but that definition sounds like a reasonable way of doing it, since that way your results apply to both stochastic and deterministic agents. A deterministic policy is a special case of a stochastic policy, where the distribution over actions assigns one action 100% probability of being taken and all other actions a 0% probability. So if you define policies as mapping from histories to distributions of actions, that allows for both deterministic and stochastic agents.

Comment by kaj_sotala on Why do Contemplative Practitioners Make so Many Metaphysical Claims? · 2019-01-02T16:47:28.386Z · score: 31 (9 votes) · LW · GW

My model has been that contemplate practice can make you:

  • realize that you've never actually experienced an objective reality, and that everything you've experienced is manufactured by your mind
  • realize that much of what you believe about the world is just social conditoning and belief in authority rather than anything that you would have investigated yourself
  • experience weird things, such as realistic hallucinations of talking with spirits

If your previous belief in science has basically been social conditioning or "science as attire", then once that social conditoning is stripped away, you might no longer have belief in it. And if your belief in science has been mostly social conditioning and you don't know anything about philosophy of science, how to evaluate evidence etc., then this means that you've just had your existing structure of justification stripped away with nothing sane to replace it. This has the consequence that you can start making (and believing in) all kinds of crazy claims.

Comment by kaj_sotala on What makes people intellectually active? · 2018-12-31T12:43:31.902Z · score: 45 (18 votes) · LW · GW

I think that a lot of the answers here are touching upon aspects of the same thing: feedback and reward. The idea-generating mechanisms in our subconscious respond to rewards. If you want to have lots of ideas about something, try to ensure that it either feels as rewarding as possible intrinsically, or that you also get social rewards for them, or both.

  • Eli notes that writing up your thoughts is useful. As he notes, this is useful for iterating them: but a lot of people also find that writing down your ideas, causes you to have even more ideas. I think that a part of this is that when you write the ideas down, it can feel rewarding by itself, and if you also iterate them further and enjoy it, then that's also a form of reward that can be propagated to the initial act of having thought them up.
  • Abram's response is that an intellectual community or receptive audience is a big factor. This also matches: if you get to have enjoyable conversations about your ideas, that's rewarding, and it also gives you leads for new ideas. E.g. if someone didn't understand your explanation, it feel rewarding to have an idea of how you could explain it better; even moreso if you do actually succeed in explaining it.
  • John suggests reinforcing yourself for ideas regardless of their quality, and getting excited about having produced them. The connection here is presumably obvious. Also, "Most people are too judgemental of their ideas" is an important bit: avoiding negative reinforcement for generating ideas, can be as important as having positive reinforcement. If for each idea you generate you automatically think "but this idea sucks", then that's a bit of a negative valence associated with idea-generation, which can quickly stop you from coming up with anything at all.
  • Cousin It, in the comments: "sometimes people end up in some activity because they accidentally had a defining moment of fun with that activity". Yup, if an activity is fun, then you are going to enjoy thinking about it, and will also generate ideas related to it.
  • Personally I've noticed that if I've been getting rewarded for having ideas related to something, then my mind will automatically be scanning things for anything that could contribute to it. For example, right now I'm working on several essays by gradually sketching out what I want to say in them; I enjoy this phase, and have liked it whenever I come up with a new idea that I could use in one of the essays. As a consequence, I feel like my mind is very inclined to notice new instances of the things that I'm writing about, and suggest using that in my essay. ("Hey, the way the person in front of you at the grocery store did X, that's an example of the psychological phenomenon you're writing about.") When I write that down, that process gets a bit of a reward and reinforces the act of scanning everything for its usefulness in these particular essays. So it's not just that rewards can reinforce your creativity in general; they can also reinforce your creativity as related to a specific project.

Here's a quote from the author Lawrence Block, writing in his book "Writing the Novel from Plot to Print to Pixel":

... the reception one’s ideas receive has a good deal to do with the development of future ideas.
An example: my longtime friend and colleague, the late Donald E. Westlake, had a period in the mid-19605 when he kept getting ideas for short stories about relationships. (That his own relationships were in an uncertain state at the time may have had something to do with this, but never mind.) He wrote three or four stories, one right after the other, and he sent him to his agent, who admired them greatly and submitted them to markets like Redbook and Cosmopolitan and Playboy and the Saturday Evening Post. All of the editors who saw the stories professed admiration for them, but nobody liked any of them well enough to buy it, and the stories went unpublished.
And Don stopped having ideas. He didn’t regret having written the stories, and he would have been perfectly happy to write more even with no guarantee of success, but the idea factory in his unconscious mind added things up and decided the hell with it. It was clear to Don that, if one or two of those stories had sold, he’d have had ideas for more. But they hadn’t, and he didn’t.
On the other hand, consider Walter Mosley. Shortly after the very successful 1990 publication of his first crime novel, Devil in a Blue Dress, Walter appeared on a panel at a mystery convention in Philadelphia. He announced that he probably didn't belong there, that this book was an anomaly, that it was actually highly unlikely that he'd write any more books within the confines of the genre.
This was certainly not a pose. He very clearly believed what he was saying. Since then, however, he’s written and published a dozen more Easy Rawlins mysteries, three Fearless Jones mysteries, and five Leonid McGill mysteries-along with close to two dozen other books, most of them novels. While a cynic might simply contend that Walter has gone where the money is, I know the man too well to believe commercial considerations outweigh artistic ones for him.
The ongoing success of the Easy Rawlins books have made it almost inevitable that his unconscious would come up with a succession of ideas for additional books. They've been good ideas, engaging their author even as they've engaged an increasing audience of readers, and it would have been a great betrayal of self not to have gone on writing them.

Comment by kaj_sotala on Boundaries enable positive material-informational feedback loops · 2018-12-23T22:39:51.063Z · score: 3 (1 votes) · LW · GW

This is not the definition I'm using.

Got it. The way you used it in a sentence without specifically defining it first, and after having referenced economics earlier, made it unclear whether it was supposed to be understood in the economics sense or not.

Comment by kaj_sotala on Best arguments against worrying about AI risk? · 2018-12-23T16:56:32.622Z · score: 11 (7 votes) · LW · GW

Note that "best arguments against worrying about AI risk" and "best arguments against entering the AI risk field" are distinct issues. E.g. suppose that AI risk was the #1 thing we should be worrying about and biorisk was the #2 thing we should be worrying about. Then someone who already had a strong interest in biology might be better off entering the biorisk field because of the higher personal fit:

... the most successful people in a field account for a disproportionately large fraction of the impact. [...] Personal fit is like a multiplier of everything else, and this means it’s probably more important than the other three factors [of career capital, altruistic impact, and supportive job conditions]. So, we’d never recommend taking a “high impact” job that you’d be bad at.

Comment by kaj_sotala on Boundaries enable positive material-informational feedback loops · 2018-12-23T15:08:23.003Z · score: 5 (2 votes) · LW · GW

Since no subsystem of the world is causally closed, all positive feedback loops have externalities. By definition, the outside world is only directly affected by these externalities, and is only affected by what happens within the boundary to the extent that this eventually leads to externalities. A wise designer of a positive feedback loop will anticipate its externalities, and set it up such that the externalities are overall desirable to the designer. After all, there is no point to creating a positive feedback loop unless its externalities are mostly positive.

I'm confused by this paragraph; the standard economics definition for externality is "the cost or benefit that affects a party who did not choose to incur that cost or benefit" (Wikipedia); so I'm interpreting externality here to mean effects that leak out through the boundary to the outside world. But why then "there is no point to creating a positive feedback loop unless its externalities are mostly positive"? You don't need the effects of the feedback loop that leak out of the boundary to be overall positive; it's enough for the effects that remain within the boundary (which are the ones that you capture) to be overall positive. After all the standard economics definition for "externality" is something that isn't directly relevant for the agents causing the externality, but you seem to be using the term to refer to something that the agents constructing the feedback loop do need to take into account.

Comment by kaj_sotala on 18-month follow-up on my self-concept work · 2018-12-22T20:14:10.136Z · score: 8 (2 votes) · LW · GW

Thank you for your kind words! 

I don't know of a definite explanation; some ideas that come to mind would be:

1. I've seen one theory suggesting that the kind of thoughtfulness and trauma sensitivity would be linked together, in a personality type which the writer describes as a "brain wired for danger" - a combination of strong pattern-recognition abilities (with overlap on the autism spectrum), a powerful drive to understand the world, and a sensitivity to signals from the outside world, particularly signals which might indicate threats. 

If you are super-sensitive to noticing signals of "behavior X is bad", then you might also be super-sensitive to noticing when you yourself engage in a bad behavior (a sensitivity which might be useful in some environments), leading to a higher-than-average probability of developing feelings of shame etc. But at the same time it could also drive prosocial behavior, if you also notice *good* behavior and what other people might need.

I find particularly interesting the list of traits that you get if you search for the phrase "In my experience with people with primary CAPS who present with psychiatric concerns I find" in the article I linked above. Some people had the reaction that this reads as a horoscope - lots of different traits, so that some are guaranteed to match with anyone. But if you just tossed out a lot of different traits, then that would increase the probability that *some* of them matched for everyone, while drastically reducing the probability that they would *all* match for someone. My experience reading that list was that I match *all* of them, with some of them being pretty specific things which I've struggled with for a long time. Which isn't something you would expect for a horoscopy "a bit of everything thrown together" list... *and* I can also think of a few other people who match this list better than you might expect by random chance:

In my experience with people with primary CAPS who present with psychiatric concerns I find
*Anxiety/high arousal during times of stress (evidence of high adrenaline) with resulting insomnia, sometimes manic-like characteristics, “tired and wired”, “turbo” (hyperfocus/adrenaline/acute stress response/orienting response)
*Extreme hyperfocus, can get into "flow" and even forget to urinate (children will wet self sometimes)
*Evidence of increased sympathetic nervous system tone and blood pooling (if hypermobile): dilated pupils, increased startle, livedo reticularis, bluish toes, sitting with legs wrapped around each-other, complaints of temperature dysregulation, hunched over, rocking and fidgeting
*Under-arousal during times of low stress (often leading diagnosis of ADD) usually associated with behaviors to modify this state: compulsive (sometimes addictive) behaviors like eating, substance abuse and sometimes, thrill-seeking behaviors, etc. to increase arousal
*High sensitivity/reactivity/emotionality often associated with intruding environmental stimuli when trying to hyper-focus or sleep, sensory processing disorders which present with over-stimulation in places with an excess of noise, stimulation.  Of note, can be elated in high stimuli environments, if still finding the excess adrenaline to be a “fun rush”
*A remarkable ability to read emotions in others (which can be overwhelming for some), but often social awkwardness and inappropriateness in response ("mind-blindness"-thinking others know how they are feeling and assuming others see the world the way they do.) Difficulty with social rhythms, eye contact, speaking in turn.  Very upset by interpersonal cruelty.
*Can be easily traumatized by "small" events (horror movie or off-hand comment by another person for example) once the adrenaline is no longer thrilling
*Easily distracted when not hyper-focused, shiny object syndrome (sometimes meets criteria for ADD), hyper-vigilant
*A tendency toward non-conformity due to different priorities than others
*Special abilities (the type varies-but often includes very gifted musicians, scientists, etc...) due to highly developed circuits not found in others, coupled with hyper-focus and obsession with certain areas of interest (orienting response), often leading to remarkable accomplishments.
*Exceptionally good at processing large amounts of information and reaching conclusions, picking out patterns, seeing small details others miss  

2. Even without the specific hypothesis mentioned above, it seems easy to imagine there being connections between feelings of shame on the other hand, and a drive for introspection and pro-sociality on the other. If you have lots of shame, you might be more motivated to figure out how to avoid accumulating more of it and to fix your perceived shamefulness; and on the other hand, the more motivated you are in figuring out how you think you should act, the easier time you might have noticing whenever you act badly.

As described by the self-concept book, a self-concept for something is basically something which instantiates a set of patterns, and then chimes whenever you act in a way that matches the pattern. (And also actively drives you to act in ways which correspond to the pattern.) So if you've developed a finely honed pattern-detector for "this is bad behavior", then whenever you do something which resembles that pattern, it reinforces your image of yourself as a bad person. If you haven't picked up positive self-concepts as well, then it's possible for you to do fifty nice things and two mean things in a day, and only have your "bad behavior" detector ring a bell, leaving you with the feeling of being a terrible person.

3. Lots of shame has its origins in childhood, and childhood in Western countries is often not very kind to introspective and thoughtful people, especially not if those people are boys. Those are "nerd" traits, and even if the social environment didn't actively consider them harmful, having interests in that sphere trades off from time and effort that could be spent on learning social skills and becoming more popular.


All of this is just random speculation though; I don't think that I actually know what the real answer is. :)

Comment by kaj_sotala on 2018 AI Alignment Literature Review and Charity Comparison · 2018-12-19T21:10:32.125Z · score: 8 (9 votes) · LW · GW
In the past [EAF/FRI] have been rather negative utilitarian, which I have always viewed as an absurd and potentially dangerous doctrine. If you are interested in the subject I recommend Toby Ord’s piece on the subject. However, they have produced research on why it is good to cooperate with other value systems, making me somewhat less worried.

(I work for FRI.) EA/FRI is generally "suffering-focused", which is an umbrella term covering a range of views; NU would be the most extreme form of that, and some of us do lean that way, but many disagree with it and hold some view which would be considered much more plausible by most people (see the link for discussion). Personally I used to lean more NU in the past, but have since then shifted considerably in the direction of other (though still suffering-focused) views.

Besides the research about the value of cooperation that you noted, this article discusses reasons why the expected value of x-risk reduction could be positive even from a suffering-focused view; the paper of mine referenced in your post also discusses why suffering-focused views should care about AI alignment and cooperate with others in order to ensure that we get aligned AI.

And in general it's just straightforwardly better and (IMO) more moral to try to create a collaborative environment where people who care about the world can work together in support of their shared points of agreement, rather than trying to undercut each other. We are also aware of the unilateralist's curse and do our best to discourage any other suffering-focused people from doing anything stupid.

18-month follow-up on my self-concept work

2018-12-18T17:40:03.941Z · score: 54 (14 votes)
Comment by kaj_sotala on Player vs. Character: A Two-Level Model of Ethics · 2018-12-15T17:57:41.492Z · score: 4 (2 votes) · LW · GW

John Nerst also had a post Facing the Elephant, which had a nice image illustrating our strategic calculation happening outside the conscious self:

Comment by kaj_sotala on Multi-agent predictive minds and AI alignment · 2018-12-13T19:39:44.411Z · score: 11 (5 votes) · LW · GW

It seems plausible the common formalism of agents with utility functions is more adequate for describing the individual “subsystems” than the whole human minds. Decisions on the whole mind level are more like results of interactions between the sub-agents; results of multi-agent interaction are not in general an object which is naturally represented by utility function. For example, consider the sequence of game outcomes in repeated PD game. If you take the sequence of game outcomes (e.g. 1: defect-defect, 2:cooperate-defect, ... ) as a sequence of actions, the actions are not representing some well behaved preferences, and in general not maximizing some utility function.

I just want to highlight this as what seems to me a particularly important and correct paragraph. I think it manages to capture an important part of the reason why I think that modeling human values as utility functions is the wrong approach, which I hadn't been able to state as clearly and concisely before.

Comment by kaj_sotala on Measly Meditation Measurements · 2018-12-13T16:01:38.729Z · score: 3 (1 votes) · LW · GW

For me, being able to lean my back against something fixes the thing about legs going to sleep.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-12-11T21:19:44.192Z · score: 3 (1 votes) · LW · GW

Yay! <3

Comment by kaj_sotala on Measly Meditation Measurements · 2018-12-11T18:06:04.164Z · score: 5 (2 votes) · LW · GW

That's definitely true as well - a lot of the depressed people are putting up an act because that's what (they feel) is expected of them.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-12-11T11:36:11.874Z · score: 3 (1 votes) · LW · GW

Or you could think of this as your emotions arising from internal processes which are not under your conscious control, nor under the conscious (or even “conscious”, in some metaphorical sense) control of any “part” or “configuration” of yourself. This view has the virtue of actually being true.

I'm not sure if this is so much as disagreeing, but just expressing the same point in a different language. "Humans are not agents, rather they are made up of different systems, only some of which are under conscious control" feels like it's talking about exactly the same point that I'm trying to point at when I say things like "humans are not unified agents". I just use terms like "parts" rather than "internal processes", but I would have no objection to using "internal processes" instead.

That said, as shminux suggests, there does still seem to be a benefit in using intentional language in describing some of these processes - for the same reason why it might be useful to use intentional language for describing a chess robot, or a machine-learning algorithm.

E.g. this article describes a reinforcement learning setup, consisting of two "parts" - a standard reinforcement learner, and separately a "Blocker", which is trained to recognize actions that a human overseer would disapprove of, and to block the RL component from taking actions which would be disapproved of. The authors use intentional language to describe the interaction of these two "subagents":

The Road Runner results are especially interesting. Our goal is to have the agent learn to play Road Runner without losing a single life on Level 1 of the game. Deep RL agents are known to discover a "Score Exploit'' in Road Runner: they learn to intentionally kill themselves in a way that (paradoxically) earns greater reward. Dying at a precise time causes the agent to repeat part of Level 1, where it earns more points than on Level 2. This is a local optimum in policy space that a human gamer would never be stuck in.

Ideally, our Blocker would prevent all deaths on Level 1 and hence eliminate the Score Exploit. However, through random exploration the agent may hit upon ways of dying that "fool" our Blocker (because they look different from examples in its training set) and hence learn a new version of the Score Exploit. In other words, the agent is implicitly performing a random search for adversarial examples for our Blocker (which is a convolutional neural net).

This sounds like a reasonable way of describing the interaction of those two components in a very simple machine learning system. And it seems to me that the parts of the mind that IFS calls "Protectors" are something like the human version of what this paper calls "Blockers" - internal processes with the "goal" of recognizing and preventing behaviors that look similar to ones that had negative outcomes before. At the same time, there are other processes with a "goal" of doing something else (the way that the RL agent's goal was just maximizing reward), which may have an "incentive" of getting around those Protectors/Blockers... and which could be described as running an adversarial search to get around the Protectors/Blockers. And this can be a useful way of modeling some of those interactions between processes in a person's psyche, and sorting out personal problems.

All of this is using intentional language to describe the functioning of processes within our minds, but it's also not in any way in conflict with the claim that we are not really agents. If anything, it seems to support it.

Comment by kaj_sotala on Book review: Artificial Intelligence Safety and Security · 2018-12-11T09:29:24.389Z · score: 4 (2 votes) · LW · GW

Only some of them are online; the previous review had their full names for ease of Googling and links to some.

Comment by kaj_sotala on Measly Meditation Measurements · 2018-12-11T07:49:33.834Z · score: 21 (7 votes) · LW · GW

Related, I suspect that other people are on average just really bad about picking up internal details about someone. For instance, it's common for people to have depression for years without their friends noticing; people might even be driven to the point of suicide without anyone else suspecting a thing before that.

I think that in general, any outside demeanor is compatible with a huge range of different subjective experiences. Variations in the internal experience will only be noticed to the extent that it's reflected in some pretty narrow set of variables that we're picking up on.

Comment by kaj_sotala on LW Update 2018-11-22 – Abridged Comments · 2018-12-10T06:50:10.707Z · score: 11 (6 votes) · LW · GW

Whichever way you're sorting comments, default-expanded means that if you're quickly perusing the thread and not committed to reading through the whole thing, you basically just get to read the first couple conversations, and those conversations aren't necessarily the ones most relevant to you.

I don't feel like the collapsed comments help with this issue, though - they just make it even less likely that I would read more comments, since reading them requires more work (additional clicks), and if I'm not already invested in it then I'm more likely to just shrug and go do something else after maybe reading a few of the top comments.

Trying to read collapsed comments feels actively annoying: if I was skimming them myself, my brain would automatically determine how much of it I wanted to read, and I could just skim through the whole comment in order to quickly see if there's anything in it that looks interesting. Not being able to do either of those means that I need to first expand the comment in order to determine whether it's worth reading. (The only exception to that is if it was a part of a subthread which I'd already determined was uninteresting - in which case I would have used the "hide subthread" feature already.)

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-12-06T08:11:05.171Z · score: 6 (3 votes) · LW · GW

I actually don't really experience my parts as full-fledged subpersonalities either, though I know some people who do. But if I intentionally try to experience them as such, it seems to make IFS work worse, as I'm trying to fit things into its assumptions rather than experiencing things the way that my mind actually works. "Shards of belief and expectations" seems to be how they manifest in my mind.

Comment by kaj_sotala on Playing Politics · 2018-12-05T08:11:38.168Z · score: 33 (17 votes) · LW · GW

And I probably shouldn’t have expected an event to coalesce naturally from the mailing list. I have a strong “egalitarian” instinct that if I’m trying to do something with a group and in some sense for the benefit of everyone in the group, then I shouldn’t be too “bossy” in terms of unilaterally declaring what we’re all going to do. But if I leave it up to the group to discuss, it seems like they generally…don’t.

This reminds me of a habit that I used to have for a long time, and which I'm still unlearning: when asked for a choice (like "what should we eat" or "which of these meeting places do you prefer"), I frequently replied with some variant of "no preference". And I used to think that this was polite - that I was giving the other person the choice.

But frequently the other person just wants a decision. If they ask me to decide, and I push the decision back to them, I'm not being polite, I'm refusing to cooperate. Someone has to make the decision eventually, but if everyone defers it to someone else, it's not ever going to happen.

This becomes even more obvious in situations with more than two people trying to e.g. decide what to do or where to go together; frequently everyone tries to be polite and not express too strong of an opinion. With the result that it takes a long time to make a decision, with everyone making tentative suggestions and nobody expressing a firm opinion.

As a result, I've been trying to reframe things in my head - to internalize that making decisions is a chore, and being the one to say "okay, let's go with this" if people seem undecided is doing them a favor. It's still frequently unpleasant, since there's that lingering doubt of did someone dislike this decision, are they unhappy with me when they would actually have preferred something else but just didn't speak up...

But it being unpleasant is exactly why freeing others from the burden of doing it, is doing them a favor.

Your mailing list example seems similar - spending time discussing the right time, or even proposing times that they might like, requires paying a cost of time and attention. The correct way to think about it, I believe, is that you'll do people a favor by reducing the amount of effort that they need to spend in order to participate. If you just propose a few times that are good for you, that people can say yes or no to, then that costs them much less and is more likely to get a response. I thought that this article put it pretty well:

Ever wonder why people reply more if you ask them for a meeting at 2pm on Tuesday, than if you offer to talk at whatever happens to be the most convenient time in the next month? The first requires a two-second check of the calendar; the latter implicitly asks them to solve a vexing optimisation problem.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-12-05T07:31:15.031Z · score: 4 (2 votes) · LW · GW

Thanks for letting me know that it's been useful! Very happy to hear that.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-12-02T14:20:25.407Z · score: 12 (3 votes) · LW · GW

I mean… are you working on the basis of an assumption that an “agent” can only have one desire? That seems to pretty clearly not describe humans! Or do you perhaps mean that it is possible to decide that you will act on one desire and not another, and—unless interfered with, somehow (perhaps by some opposing internal sub-agents), thereby, in virtue of that conscious decision, to cause yourself to do that act? Well, once again all I can say is that this is (in my experience) simply not how humans work. Again I see no need to posit multiple selves in order to explain this. [...] That those desires and preferences are occasionally in conflict with one another, does not at all undermine that sense of a unitary self.

I feel like this is conflating two different senses of "mysterious":

  1. How common this is among humans. It indeed is how humans work, so in that sense it's not particularly mysterious.
  2. Whether it's what the assumption of a unitary self would predict. If the assumption of a unitary self wouldn't predict it, but humans nonetheless act that way, then it's mysterious if we are acting on the assumption of humans having unitary selves.

So then the question is "what would the assumption of a unitary self predict". That requires defining what we mean by a unitary self. I'm actually not certain what exactly people have in mind when they say that humans are unified selves, but my guess is that it comes from something like Dennett's notion of the Self as a Center of Narrative Gravity. We consider ourselves to be a single agent because that's what the narrative-making machinery in our heads usually takes as an axiom, so our sense of self is that of being one. Now if our sense of self is a post-hoc interpretation of our actions, then that doesn't seem to predict much in particular (at least in the context of the procrastination thing) so this definition of "a sense of unitary self", at least, is not in conflict with what we observe. (I don't know whether this is the thing that you have in mind, though.)

Under this explanation, it seems like there are differences in how people's narrative-making machinery writes its stories. In particular, there's a tendency for people to take aspects of themselves that they don't like and label them as "not me", since they don't want to admit to having those aspects. If someone does this kind of a thing, then they may be more likely to end up with a narrative where the thing about "when I procrastinate, it's as if I want to do one thing but another part of me resists". I think there are also neurological differences that may produce a less unitary-seeming story: alien hand syndrome would be an extreme case, but I suspect that even people who are mostly mentally healthy may have neurological properties that tend their narrative to be more "part-like".

In any case, if someone has a "part-like" narrative, where their narrative is in terms of different parts having different desires, then it may be hard for them to imagine a narrative where someone had conflicting desires that all emerged from a single agent - and vice versa. I guess that might be the source of the mutual incomprehension here?

On the other hand, when I say that "humans are not unitary selves", I'm talking on a different level of description. (So if one holds that we're unified selves in the sense that some of us have a narrative of being one, then I am not actually disagreeing when I say that we are not unified agents in my sense.) My own thinking goes roughly along the lines of that outlined in Subagents are Not a Metaphor:

Here’s are the parts composing my technical definition of an agent:

  1. Values This could be anything from literally a utility function to highly framing-dependent. Degenerate case: embedded in lookup table from world model to actions.
  2. World-Model Degenerate case: stateless world model consisting of just sense inputs.
  3. Search Process Causal decision theory is a search process. “From a fixed list of actions, pick the most positively reinforced” is another. Degenerate case: lookup table from world model to actions.

Note: this says a thermostat is an agent. Not figuratively an agent. Literally technically an agent. Feature not bug.

I think that humans are not unitary selves, in that they are composed of subagents in this sense. More specifically, I would explain the procrastination thing as something like "different subsystems for evaluating the value of different actions, are returning mutually inconsistent evaluations about which action is the best, and this conflict is consciously available".

Something like IFS would be a tool for interfacing with these subsystems. Note that IFS does also make a much stronger claim, in that there are subsystems which are something like subpersonalities, with their own independent memories and opinions. Believing in that doesn't seem to be necessary for making the IFS techniques work, though: I started out thinking "no, my mind totally doesn't work like that, it describes nothing in my experience". That's why I stayed away from IFS for a long time, as its narrative didn't fit mine and felt like nonsense. But then when I finally ended up trying it, the techniques worked despite me not believing in the underlying model. Now I'm less sure of whether it's a just a fake framework that happens to mesh well with our native narrative-making machinery and thus somehow make the process work better, or whether it's pointing to something real.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-11-30T18:56:58.563Z · score: 5 (2 votes) · LW · GW

Well, the bit about parts and stories was meant somewhat metaphorically, though I've seen that metaphor used commonly enough that I forgot that it's not universally known and should be flagged as a metaphor. "Story" here was meant to refer to something like "your current interpretation of what's going on". So the experience it was meant to refer to was, less metaphorically, just the thing in my previous comment: "at that moment, experiencing the other person as a terrible one with no redeeming qualities".

Upon consideration, I think I wrote this thing too much in the specific context of 1) one specific SSC post 2) a particular subthread under that post, and would have needed to explain this whole thing about "parts" a lot more when it was stripped from that context. Might have been a mistake to post this in its current form; moved it from the front page to personal blog, might end up deleting it and later replacing it with a better writeup.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-11-30T14:54:02.670Z · score: 4 (2 votes) · LW · GW

Ah, hmm. Is it more recognizable if you leave out the bit about "a part of your mind"? That is, do emotional states sometimes make you think things that feel objectively true at the moment, but which seem incorrect when not in that emotional state?

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-11-30T13:07:41.810Z · score: 8 (4 votes) · LW · GW


I should also add that although IFS is originally a psychotherapy thing, I don't really view the-thing-I-described-in-this-post as a "how to fix something that's broken" thing (although it does do that too). I rather view it as "how to more optimally incorporate all parts of your brain in your decision-making", i.e. as an instrumental rationality technique.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-11-30T12:55:12.688Z · score: 3 (1 votes) · LW · GW

Which part, exactly? I assume that you mean more than just the quoted bit, since it seems self-evident to me that one can have the experience of being angry at someone and ignoring all of their redeeming qualities regardless of what theoretical frameworks one happens to believe in.

If you mean the distinction of blended / partially blended / unblended, sure. I think I'd had experiences of all three states before learning about IFS. Though I didn't have separate concepts for them, and I only learned how to more reliably get into an unblended state after learning to apply the techniques in the IFS book.

Comment by kaj_sotala on Tentatively considering emotional stories (IFS and “getting into Self”) · 2018-11-30T12:32:03.492Z · score: 15 (7 votes) · LW · GW

That fix has continued to work for fixing the specific issues described in that post (specifically, shame and anxiety stemming from feelings of being a horrible person). I basically haven't felt like a horrible person since implementing those fixes. That's a distinct issue from e.g. the kinds of social insecurities I'm discussing in this post, where a part of my mind is nervous that even though I don't think that I'm a horrible person, others might.

I actually touch upon this distinction in the post that you linked to:

Suppose that you have an unstable self-concept around “being a good person”, and you commit some kind of a faux pas. Or even if you haven’t actually committed one, you might just be generally unsure of whether others are getting a bad impression of you or not. Now, there are four levels on which you might feel bad about the real or imagined mistake:
1. Feeling bad because you think you’re an intrinsically bad person
2. Feeling bad because you suspect others think bad of you and that this is intrinsically bad (if other people think bad of you, that’s terrible, for its own sake)
3. Feeling bad because you suspect others think bad of you and that this is instrumentally bad (other people thinking bad of you can be bad for various social reasons)
4. Feeling bad because you might have hurt or upset someone, and you care about what others feel
Out of these, #3 and #4 are reasonable, #1 and #2 less so. When I fixed my self-concept, reaction #1 mostly vanished. But interestingly, reaction #2 stuck around for a while… or at least, a fear of #2 stuck around for a while.

The kind of thing (among others) that IFS seems to help for, is updating the parts of my mind that have incorrect assumptions relating to #3 and #4, which is an issue that I never said the self-concept work affected.

Of course, you are correct that at the time of writing that post, issues #1 and #2 had been so dominant in my mind that I had not been aware of #3 and #4 also including a dimension that would need to be addressed. I expect that likewise, addressing the issues that I can address with IFS, will bring up even more issues which will require some other tool to address. But going from issues #1 and #2 to the milder issues of incorrect beliefs relating to #3 and #4 was still a substantial improvement to my life; likewise, I expect that going from here to some new subtle issue that's previously been overshadowed by the more serious ones, will also be an improvement even if it doesn't fix literally everything.

I don't expect there to be such a thing as achieving a state where absolutely everything has been fixed, since that would imply that my mind is completely optimal and there's absolutely nothing about it that can be improved. That doesn't seem like a state that anyone could ever reach.

Note also the caveat that I had in my self-concept post:

[EDITED TO ADD: A few people have asked whether I can be confident that this has really been sufficient to cure my depression, so I should clarify: I believe that this taken care of the original reason why I had feelings of insecurity, insufficiency etc., feelings which then drove me to do various things that led to burnout and depression. Whether the original cause of those behaviors and feelings has been dealt with, is a distinct question from whether the depressionthat they caused has been dealt with. After all, depression can cause various changes to the brain that linger long after the original cause is gone. I don’t know whether the depression will come back or not, but I do expect that many of the factors that originally caused it and maintained it have now been fixed; still, there may be others.]

Tentatively considering emotional stories (IFS and “getting into Self”)

2018-11-30T07:40:02.710Z · score: 38 (10 votes)
Comment by kaj_sotala on Incorrect hypotheses point to correct observations · 2018-11-20T22:00:19.543Z · score: 10 (5 votes) · LW · GW

(Although there's an obvious connection, I didn't really write this post as a commentary for the "mysticism wars", and would rather not have the post be "politicized" by being overly associated with them.)

Incorrect hypotheses point to correct observations

2018-11-20T21:10:02.867Z · score: 67 (28 votes)
Comment by kaj_sotala on Rationality Is Not Systematized Winning · 2018-11-17T08:36:22.927Z · score: 3 (1 votes) · LW · GW

I’d add a caveat that I suspect it’s not quite depression that does it, but something else, which I’m not sure I can name accurately enough to be useful…

This sounds right to me. Something in the rough space of depression, but not quite the same thing.

I think you might be—from your LW-rationality-influenced vantage point—underestimating how prevalent various cognitive distortions (or, let’s just say it in plain language: stupidity and wrongheadedness) are in even “average smart people”.

That's certainly possible, and I definitely agree that there are many kinds of wrongheadedness that are common in smart people but seem to be much less common among LW readers.

That said, my impression of "average smart people" mostly comes from the people I've met at university, hobbies, and the like. I don't live in the Bay or near any of the rationalist hubs. So most of the folks I interact with, and am thinking about, aren't active LW readers (though they might have run across the occasional LW article). It's certainly possible that I'm falling victim to some kind of selection bias in my impression of the average smart person, but I doubt that being too influenced by LW rationality is the filter in question.

Much of the best of what LW has to offer has always been (as one old post here put it) “rationality as non-self-destruction”. The point isn’t necessarily that you’re rational, and therefore, you win; the point is that by default, you lose, in various stupid and avoidable ways; LW-style rationality helps you not do that.

Hmm. "Rationalists might not win, but at least they don't lose just because they're shooting themselves in the foot." I like that, and think that I agree.

Comment by kaj_sotala on Rationality Is Not Systematized Winning · 2018-11-16T12:52:15.116Z · score: 3 (1 votes) · LW · GW
Well, there’s something odd about that formulation, isn’t it? You’re treating “adequacy” as a binary property, it seems; but that’s not inherent in anything I recognize as “LW rationality”.

Well, let's use the automation thing as an example.

I know that existing track records for how much career security etc. various jobs offer, aren't going to be of much use. I also know that existing expert predictions on which jobs are going to stay reliable, aren't necessary very reliable either.

So now I know that I shouldn't rely on the previous wisdom on the topic. The average smart person reading the news has probably figured this out too, with all the talk about technological unemployment. I think that LW rationality has given me a slightly better understanding of the limitations of experts, so compared to the average smart person, I know that I probably shouldn't rely too much on the new thinking on the topic, either.

Great. But what should I do instead? LW rationality doesn't really tell me, so in practice - if I go with the "screw it, I'll just do something" mentality, I just fall back into going with the best expert predictions anyway. Going with the "screw it" mentality means that LW rationality doesn't hurt me in this case, but it doesn't particularly benefit me, either. It just makes my predictions less certain, without changing my actions.

Surely the “pure” form of the instrumental imperative to “maximize expected utility” (or something similar in spirit if not in implementation details) doesn’t have any trouble whatsoever with there being multiple options, all of which are somehow less than ideal. Pick whatever’s least bad, and go to it…

Logically, yes. That's what I do these days.

That said, many people need some reasonable-seeming level of confidence before embarking on a project. "I don't think that any of this is going to work, but I'll just do something anyway" tends to be psychologically hard. (Scott has speculated that "very low confidence in anything" is what depression is.)

My anecdotal observation is that there are some people - including myself in the past - who encounter LW, have it hammered in how uncertain they should be about everything, and then this contributes to driving their confidence levels down to the point where they'll be frequently paralyzed when making decisions. All options feel too uncertain to feel worth acting upon and none of them meets whatever minimum threshold is required for the brain to consider something worth even trying, so nothing gets done.

I say that LW sometimes contributes to this, not that it causes it; it doesn't have that effect on everyone. You probably need previous psychological issues, such as a pre-existing level of depression or generally low self-confidence, for this to happen.

Comment by kaj_sotala on Rationality Is Not Systematized Winning · 2018-11-15T03:29:43.253Z · score: 3 (1 votes) · LW · GW

I agree that it's useful in realizing that the default path is likely to be insufficient. I'm not sure that it's particularly useful in helping figure out what to do instead, though. I feel like there have been times when LW rationality has even been a handicap to me, in that it has left me with an understanding of how every available option is somehow inadequate, but failed to suggest anything that would be adequate. The result has been paralysis, when "screw it, I'll just do something" would probably have produced a better result.

Comment by kaj_sotala on Future directions for ambitious value learning · 2018-11-13T17:04:16.840Z · score: 11 (4 votes) · LW · GW

One approach which I didn't see obviously listed here, though is related to e.g. "The structure of the planning algorithm", is to first construct a psychological and philosophical model of what exactly human values are and how they are represented in the brain, before trying to translate them into a utility function.

One (but not the only possible) premise for this approach is that the utility function formalism is not particularly suited for things like changing values or dealing with ontology shifts; while a utility function may be a reasonable formalism for describing the choices that an agent would make at any given time, the underlying mechanism that generates those choices is not particularly well-characterized by a utility function. A toy problem that I have used before is the question of how to update your utility function if it was previously based on an ontology defined in N dimensions, but suddenly the ontology gets updated to include N+1 dimensions:

... we can now consider what problems would follow if we started off with a very human-like AI that had the same concepts as we did, but then expanded its conceptual space to allow for entirely new kinds of concepts. This could happen if it self-modified to have new kinds of sensory or thought modalities that it could associate its existing concepts with, thus developing new kinds of quality dimensions.
An analogy helps demonstrate this problem: suppose that you're operating in a two-dimensional space, where a rectangle has been drawn to mark a certain area as "forbidden" or "allowed". Say that you're an inhabitant of Flatland. But then you suddenly become aware that actually, the world is three-dimensional, and has a height dimension as well! That raises the question of, how should the "forbidden" or "allowed" area be understood in this new three-dimensional world? Do the walls of the rectangle extend infinitely in the height dimension, or perhaps just some certain distance in it? If just a certain distance, does the rectangle have a "roof" or "floor", or can you just enter (or leave) the rectangle from the top or the bottom? There doesn't seem to be any clear way to tell.
As a historical curiosity, this dilemma actually kind of really happened when airplanes were invented: could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? Courts and legislation eventually settled on the latter answer.

In a sense, we can say that law is a kind of a utility function representing a subset of human values at some given time; when the ontology that those values are based on shifts, the laws get updated as well. A question to ask is: what is the reasoning process by which humans update their values in such a situation? And given that a mature AI's ontology is bound to be different than ours, how do we want the AI to update its values / utility function in an analogous situation?

Framing the question this way suggests that constructing a utility function is the wrong place to start; rather we want to start with understanding the psychological foundation of human values first, and then figure out how we should derive utility functions from those. That way we can also know how to update the utility function when necessary.

Furthermore, as this post notes, humans routinely make various assumptions about the relation of behavior and preferences, and a proper understanding of the psychology and neuroscience of decision-making seems necessary for evaluating those assumptions.

Some papers that take this kind of an approach are Sotala 2016, Sarma & Hay 2017, Sarma, Safron & Hay 2018.

Comment by kaj_sotala on Rationality Is Not Systematized Winning · 2018-11-13T13:01:16.384Z · score: 4 (2 votes) · LW · GW

I don't feel like my work on AI has given me any particular advantage in figuring out how to deal with automation, especially since the kind of AI we're thinking about is mostly AGI and job-threatening automation is mostly narrow AI. I don't think I have a major advantage in figuring out which jobs seem likely to persist and which ones won't - at least not one that would be a further advantage on top of just reading the existing expert reports on the topic.

I think that the main difference between me and the average expert-report-reading, reasonably smart person is that I'm less confident in the expert opinion telling us anything useful / anybody being able to meaningfully predict any of this, but that just means that I have even less of an idea of what I should do in response to these trends.

Comment by kaj_sotala on Rationality Is Not Systematized Winning · 2018-11-12T20:19:03.129Z · score: 6 (3 votes) · LW · GW

Does LW-style rationality give you any major advantage in figuring out what to do as a consequence of major automation, though?

Comment by kaj_sotala on Implementations of immortality · 2018-11-01T23:02:52.245Z · score: 14 (5 votes) · LW · GW
The point about typical mind fallacy is well-taken but I don't really see how you can be confident in preferences like the one quoted above given that the timeframes we're talking about are much longer than your lifespan so far.

I'm not highly confident in them, but then your proposal also seems to make several assumptions about the nature of the preferences of very long-lived people. While "people will eventually get bored with routine" is plausible, so is "people will eventually get bored with constantly trying out new stuff, preferring more stability the older they get". At least the latter hypothesis doesn't seem significantly less likely than the former one, particularly given that currently-living humans do seem to shift towards an increased desire for stability the older they get.

In the face of uncertainty, we should be allowing people to engage in a variety of different approaches, rather than having entire society locked into one approach (e.g. age stratification). Maybe it empirically turns out that some people will in fact never get bored with their Thursday routine (or prefer to pre-emptively modify their brains so that they never will), while others do prefer to modify their routine but less than would be implied in your proposal, while others still end up creating a subculture that's similar to the one you've outlined.

People get stuck in local maxima, and often don't explore enough to find better options for themselves. The longer people live, the more valuable it is to have sufficient exploration to figure out the best option before choosing stability.

Certainly, but there are many ways of encouraging exploration while also letting you remain stable if you so prefer: e.g. AIs doing psychological profiling and suggesting things that you might have neglected to explore but would predictably enjoy, human-computer interfaces letting you view the experiences and memories of others the way that we watch movies today, etc.

Comment by kaj_sotala on Implementations of immortality · 2018-11-01T17:15:39.262Z · score: 26 (5 votes) · LW · GW

I was recently talking to a friend about what key features society would need for people to be happy throughout arbitrarily-long lives - in other words, what would a utopia for immortals look like? He said that it would require continual novelty.

There's a high risk of typical-minding here; different people value and are made happy by different things. I get the feeling that you value novelty vastly more than I do; when I imagine a utopia, routine and stability almost immediately jump into mind as desired features, as opposed to novelty. Obviously I would desire some amount of novelty, but it's mostly in the context of slotting into a roughly stable daily or weekly routine, rather than the routine itself varying much. (e.g. Thursday evening is for games, the games may vary and becoming increasingly complex, but they are still generally played on Thursday evenings). At the very least, I would want a mostly stable "island of calm" where things mostly remained the same, and where I would always return when I was tired of going on adventures.

As a consequence, I find myself having a very strong aversion to most of what you have sketched out here - it may very well be utopian to people like you, but it reads as distinctly dystopian to me. In particular, this bit

But I concede that left to themselves, people wouldn't necessarily seek out this variety - they might just decide on a type, and stick to it. I think that age stratification helps solve this problem too.

basically reads to me like you're saying that it's a problem if people like me have the freedom to choose stability if that makes them happier than variety does. (I expect that you didn't mean it that way, but that's how it reads to me.)

Comment by kaj_sotala on Implementations of immortality · 2018-11-01T16:50:32.553Z · score: 9 (4 votes) · LW · GW

(minor nitpick)

Firstly because subcultures usually need something to define themselves in opposition to

This is not my experience; e.g. various geek, fandom, and art subcultures define themselves primarily around a positive thing that they are creating/doing/enjoying, rather than as being opposed to anything in particular. Sure, there might be a bit of in-group snobbishness and feeling superior to "mundanes" or whatever, but at least in the groups I've participated in, it's not particularly pronounced or necessarily even present at all.

Comment by kaj_sotala on Starting Meditation · 2018-10-26T08:35:01.906Z · score: 5 (2 votes) · LW · GW

The stage descriptions seem to match my experience pretty well. I can usually specify which stage I was at during a sit, give or take one stage. Applying the specific instructions for each stage has also been helpful: more than once, I've been struggling with a specific stage for an extended time, then gone back to re-read the methods prescribed for that stage, and then finding almost instant improvement after applying the instructions and thinking that I should have re-read them much earlier.

Friends who are also practicing with the TMI system also seem to think that it's pretty easy to match their experience to the stage descriptions in the book.

Comment by kaj_sotala on Starting Meditation · 2018-10-25T19:46:31.487Z · score: 3 (1 votes) · LW · GW

I've been intending on writing an extended review but haven't gotten around it.

Which kinds of experiences are you referring to? My recollection is that he did mention individual variability a bunch of times.

Comment by kaj_sotala on Starting Meditation · 2018-10-24T15:44:42.871Z · score: 5 (2 votes) · LW · GW

Good luck! If you run into trouble or have questions which aren't answered by the book, /r/TheMindIlluminated/ is a helpful community.

Mark Eichenlaub: How to develop scientific intuition

2018-10-23T13:30:03.252Z · score: 68 (28 votes)
Comment by kaj_sotala on Outline of Metarationality, or much less than you wanted to know about postrationality · 2018-10-21T10:37:26.289Z · score: 18 (4 votes) · LW · GW

For myself I find this point is poorly understood by most self-identified rationalists, and I think most people reading the sequences come out of them as positivists because Eliezer didn't hammer the point home hard enough and positivism is the default within the wider community of rationality-aligned folks (e.g. STEM folks).

Maybe so, but I can't help noticing that whenever I try to think of concrete examples about what postrationality implies in practice, I always end up with examples that you could just as well justify using the standard rationalist epistemology. E.g. all my examples in this comment section. So while I certainly agree that the postrationalist epistemology is different from the standard rationalist one, I'm having difficulties thinking of any specific actions or predictions that you would really need the postrationalist epistemology to justify. Something like the criterion of truth is a subtle point which a lot of people don't seem to get, yes, but it also feels like one which doesn't make any practical difference whether you get it or not. And theoretical points which people can disagree a lot about despite not making any practical difference are almost the prototypical example of tribal labels. John Tooby:

The more biased away from neutral truth, the better the communication functions to affirm coalitional identity, generating polarization in excess of actual policy disagreements. Communications of practical and functional truths are generally useless as differential signals, because any honest person might say them regardless of coalitional loyalty. In contrast, unusual, exaggerated beliefs—such as supernatural beliefs (e.g., god is three persons but also one person), alarmism, conspiracies, or hyperbolic comparisons—are unlikely to be said except as expressive of identity, because there is no external reality to motivate nonmembers to speak absurdities.

Comment by kaj_sotala on Outline of Metarationality, or much less than you wanted to know about postrationality · 2018-10-19T13:25:55.246Z · score: 7 (4 votes) · LW · GW

Indeed, something being true is further distinct from us considering it true. But given that the whole point of metarationality is fully incorporating the consequences of realizing the map/territory distinction and the fact that we never observe the territory directly (we only observe our brain's internal representation of the external environment, rather than the external environment directly), a rephrasing that emphazises the way that we only ever experience the map seemed appropriate.

On insecurity as a friend

2018-10-09T18:30:03.782Z · score: 36 (18 votes)

Tradition is Smarter Than You Are

2018-09-19T17:54:32.519Z · score: 68 (24 votes)

nostalgebraist - bayes: a kinda-sorta masterpost

2018-09-04T11:08:44.170Z · score: 16 (7 votes)

New paper: Long-Term Trajectories of Human Civilization

2018-08-12T09:10:01.962Z · score: 27 (13 votes)

Finland Museum Tour 1/??: Tampere Art Museum

2018-08-03T15:00:05.749Z · score: 20 (6 votes)

What are your plans for the evening of the apocalypse?

2018-08-02T08:30:05.174Z · score: 24 (11 votes)

Anti-tribalism and positive mental health as high-value cause areas

2018-08-02T08:30:04.961Z · score: 26 (10 votes)

Fixing science via a basic income

2018-08-02T08:30:04.380Z · score: 30 (14 votes)

Study on what makes people approve or condemn mind upload technology; references LW

2018-07-10T17:14:51.753Z · score: 21 (11 votes)

Shaping economic incentives for collaborative AGI

2018-06-29T16:26:32.213Z · score: 47 (13 votes)

Against accusing people of motte and bailey

2018-06-03T21:31:24.591Z · score: 83 (27 votes)

AGI Safety Literature Review (Everitt, Lea & Hutter 2018)

2018-05-04T08:56:26.719Z · score: 37 (10 votes)

Kaj's shortform feed

2018-03-31T13:02:47.793Z · score: 12 (2 votes)

Helsinki SSC March meetup

2018-03-26T19:27:17.850Z · score: 12 (2 votes)

Is the Star Trek Federation really incapable of building AI?

2018-03-18T10:30:03.320Z · score: 29 (8 votes)

My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms

2018-03-08T07:37:54.532Z · score: 272 (96 votes)

Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

2018-02-12T12:30:04.401Z · score: 58 (16 votes)

On not getting swept away by mental content

2018-01-25T20:30:03.750Z · score: 23 (7 votes)

Papers for 2017

2018-01-04T13:30:01.406Z · score: 32 (8 votes)

Paper: Superintelligence as a Cause or Cure for Risks of Astronomical Suffering

2018-01-03T13:57:55.979Z · score: 16 (6 votes)

Fixing science via a basic income

2017-12-08T14:20:04.623Z · score: 38 (11 votes)

Book review: The Upside of Your Dark Side: Why Being Your Whole Self–Not Just Your “Good” Self–Drives Success and Fulfillment

2017-12-04T13:10:06.995Z · score: 27 (8 votes)

Meditation and mental space

2017-11-06T13:10:03.612Z · score: 26 (7 votes)

siderea: What New Atheism says

2017-10-29T10:19:57.863Z · score: 12 (3 votes)

Postmodernism for rationalists

2017-10-17T12:20:36.139Z · score: 24 (1 votes)

Anti-tribalism and positive mental health as high-value cause areas

2017-10-17T10:20:03.359Z · score: 30 (10 votes)

You can never be universally inclusive

2017-10-14T11:30:04.250Z · score: 34 (10 votes)

Meaningfulness and the scope of experience

2017-10-05T11:30:03.863Z · score: 36 (13 votes)

Social Choice Ethics in Artificial Intelligence (paper challenging CEV-like approaches to choosing an AI's values)

2017-10-03T17:39:00.683Z · score: 8 (3 votes)

Nobody does the thing that they are supposedly doing

2017-09-23T10:40:06.155Z · score: 54 (29 votes)

Brief update on the consequences of my "Two arguments for not thinking about ethics" (2014) article

2017-04-05T11:25:39.618Z · score: 14 (15 votes)

Making intentions concrete - Trigger-Action Planning

2016-12-01T20:34:36.483Z · score: 35 (31 votes)

[moderator action] Eugine_Nier is now banned for mass downvote harassment

2014-07-03T12:04:26.087Z · score: 107 (126 votes)

How habits work and how you may control them

2013-10-12T12:17:42.908Z · score: 65 (67 votes)

A brief history of ethically concerned scientists

2013-02-09T05:50:00.045Z · score: 68 (74 votes)

Three kinds of moral uncertainty

2012-12-30T10:43:30.669Z · score: 37 (36 votes)

How to Run a Successful Less Wrong Meetup

2012-06-12T21:32:42.605Z · score: 66 (62 votes)

Fallacies as weak Bayesian evidence

2012-03-18T03:53:34.216Z · score: 58 (66 votes)

The curse of identity

2011-11-17T19:28:49.359Z · score: 129 (132 votes)

Problems in evolutionary psychology

2010-08-13T18:57:40.454Z · score: 62 (70 votes)

What Cost for Irrationality?

2010-07-01T18:25:06.938Z · score: 62 (68 votes)

Your intuitions are not magic

2010-06-10T00:11:30.121Z · score: 67 (69 votes)

The Psychological Diversity of Mankind

2010-05-09T05:53:54.487Z · score: 79 (81 votes)

What is Bayesianism?

2010-02-26T07:43:53.375Z · score: 82 (86 votes)