Formalising decision theory is hard 2019-08-23T03:27:24.757Z · score: 18 (19 votes)
Quantifying anthropic effects on the Fermi paradox 2019-02-15T10:51:04.298Z · score: 26 (12 votes)


Comment by lanrian on Dominic Cummings: "we’re hiring data scientists, project managers, policy experts, assorted weirdos" · 2020-01-05T13:13:55.533Z · score: 10 (2 votes) · LW · GW

And yet here's a rationalist who upturned global politics singlehandedly, and credits LessWrong with his success.

Source? I've googled his name and LessWrong, but can't find him saying anything about it.

Comment by lanrian on Can fear of the dark bias us more generally? · 2019-12-22T23:20:18.756Z · score: 3 (3 votes) · LW · GW

This is just anecdotal, but me, a friend, and plausibly Randall Munroe are significantly more socially risk-taking at night than during other times of day. This might be directly connected to the time of day, or just be a consequence of sleep deprivation.

I also have irrational fears during the night, sometimes, but I would guess that this is largely due to being sleepy, stupid, and alone, which causes me to be more suggestible to stray thoughts in general. I wouldn't be surprised if darkness also contributes, though.

Comment by lanrian on [Personal Experiment] One Year without Junk Media · 2019-12-18T09:45:18.710Z · score: 2 (2 votes) · LW · GW

You can sculpt a service like this out of pretty much any service that keeps its old shows around. For example, you can use ublock origin to remove all side-bar video-suggestions from youtube (and also remove comments, if you want). Then you can just forbid yourself from going to the home page (or automatically block it with something like leechblock), and only ever access videos by doing search directly. If you have a way of adding any search engine to your browser (which I recommend getting; I think there are easy ways to do this in most browsers, though I use vimium), you can add or or whatever you want to it.

Comment by lanrian on How time tracking can help you prioritize · 2019-12-16T22:34:06.245Z · score: 4 (2 votes) · LW · GW

If you want automatic time tracking for a mac (with good support for manual assignment), and you're willing to pay for it, I tested several options two years back and decided that timing was best. I'm still happy with it. A nice thing with automatic time-tracking is that I have decent data for a year where I didn't use it actively at all.

That said, I don't want to ruin the call-to-action by forcing anyone to consider several options. I haven't tried toggle, but it seems like a good choice. Go forth and try time tracking!

Comment by lanrian on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-16T20:01:57.051Z · score: 1 (1 votes) · LW · GW

Wait, what? If compatibilism doesn't suggest that I'm choosing between actions, what am I choosing between?

Comment by lanrian on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-16T18:01:13.647Z · score: 1 (1 votes) · LW · GW

MWI is deterministic, so you can't alter the percentages by any kind of free will, despite what people keep asserting.

Neither most collapse-theories nor MWI allow for super-physical free will, so that doesn't seem relevant to this question. Since the question concerns what one should do, it seems reasonable to assume that some notion of choice is possible.

(FWIW, I'd guess compatibilism is the most popular take on free will on LW.)

Comment by lanrian on Toon Alfrink's sketchpad · 2019-12-12T21:01:46.552Z · score: 2 (2 votes) · LW · GW

We have absolutely no reason to believe that behavior correlates with consciousness.

The strong version of this can't be true. You claiming that you're conscious is part of your behaviour. Hopefully, it's approximately true that you would claim that you're conscious iff you believe that you're conscious. If behaviour doesn't at all correlate with consciousness, it follows that your belief in consciousness doesn't at all correlate with you being conscious. Which is a reductio, because the whole point with having beliefs is to correlate them with the truth.

Comment by lanrian on The Lesson To Unlearn · 2019-12-09T22:14:35.144Z · score: 4 (2 votes) · LW · GW

I'm not sure what you mean with "the spirit of passing tests is hacking them". Do you mean that the tests were intentionally designed to be hackable? Because it seems like Graham is very much not saying that:

Merely talking explicitly about this phenomenon is likely to make things better, because much of its power comes from the fact that we take it for granted. After you've noticed it, it seems the elephant in the room, but it's a pretty well camouflaged elephant. The phenomenon is so old, and so pervasive. And it's simply the result of neglect. No one meant things to be this way. This is just what happens when you combine learning with grades, competition, and the naive assumption of unhackability.

I'm less confident of what Hotel Concierge's point is (partly because it's a damn long essay with many distinct points), but at least the end of it I'd summarize as: "Success correlates with misery, because too much perfectionism contributes to both, and it's a problem when people are pushed and selected towards being too perfectionist". Some relevant passages:

I think the psychopathology term for TDTPT [this means The Desire To Pass Tests] is “perfectionism.” [...]
Perfectionism—like literally everything else—is part of a spectrum, good in moderation and dangerous in overdose. [...]
And so if you select for high TDTPT, if you take only the highest scores and most feverishly dedicated hoop-jumping applicants, then there is no way around it: you are selecting for a high fraction of unhappy people. [...]
I’ve used Scantron-centric examples because Scantrons are easy to quantify, but tests are everywhere, and I promise you that the same trait that made me check my answers ten times over is present in the girl that spends two and a half hours doing her makeup, pausing every five minutes to ask a roommate if she looks ugly. TDTPT is the source of anorexia, body dysmorphia, workaholism, anxiety (“I just can’t find anything to say that doesn’t sound stupid”), obsession, and a hundred million cases of anhedonia, fatigue, and inadequacy [...]

So they're saying that a little TDTPT is good for you, but many people have too much, and that's bad for their mental health. Looking at that last paragraph, the examples aren't particularly connected to the hackability of tests, as far as I can tell. That there are important tests which are testing arbitrary things certainly contribute to the problem, since being perfectionist about meaningless work is less productive and more likely to lead to burnout, but it's not essential to the point, and it's not something that Hotel Concierge emphasizes.

Comment by lanrian on The Lesson To Unlearn · 2019-12-09T18:36:33.968Z · score: 8 (5 votes) · LW · GW

I don't think the essays say the same thing. Paul Graham claims that school in particular encourages hacking tests, and that the spirit of hacking tests is bad for many real-world problems. Hotel Concierge equates the will to do well on tests with perfectionism, and thinks that this helps a lot with success in all parts of life, but causes misery for the perfectionists. The distinction between hackable tests and non-hackable tests is neither emphasized nor necessary for the latter, while it's central to the former.

Comment by lanrian on Karate Kid and Realistic Expectations for Disagreement Resolution · 2019-12-06T15:43:26.500Z · score: 7 (3 votes) · LW · GW
People have a really hard time with interventions often because they literally do not have a functioning causal model of the thing in question. People who apply deliberate practice to a working causal model often level up astonishingly quickly.

Could you give an example of a functioning causal model that quickly helps you learn a topic?

I'm not sure whether you're thinking about something more meta-level, "what can I practice that will cause me to get better", or something more object-level, "how does mechanics work", and I think an example would help clarify. If it's the latter, I'm curious about what the difference is between having a functioning causal model of the subject (the precondition for learning) and being good at the subject (the result of learning).

Comment by lanrian on Effect of Advertising · 2019-11-27T21:25:33.940Z · score: 2 (2 votes) · LW · GW
Maybe I am too negative about advertising, but it seems like its major strategy is to annoy me. Like that advertisement I won't mention that I have recently seen (the first five seconds of) perhaps several hundred times, because YouTube plays it at the beginning of almost every video I see.

FYI, adblockers (like ublock origin) work fine to prevent all of youtube's ads, including the video ones.

Comment by lanrian on Hazard's Shortform Feed · 2019-11-25T07:59:03.106Z · score: 1 (1 votes) · LW · GW

That does seem to change things... Although I'm confused about what simplicity is supposed to refer to, now.

In a pure bayesian version of this setup, I think you'd want some simplicity prior over the worlds, and then discard inconsistent worlds and renormalize every time you encounter new data. But you're not speaking about simplicity of worlds, you're speaking about simplicity of propositions, right?

Since a propositions is just a set of worlds, I guess you're speaking about the combined simplicity of all the worlds. And it makes sense that that would increase if the proposition is consistent with more worlds, since any of the worlds would indeed lead to the proposition being true.

So now I'm at "The simplicity of a proposition is proportional to the prior-weighted number of worlds that it's consistent with". That's starting to sound closer, but you seem to be saying that "The simplicity of a proposition is proportional to the number of other propositions that it's consistent with"? I don't understand that yet.

(Also, in my formulation we need some other kind of simplicity for the simplicity prior.)

Comment by lanrian on Hazard's Shortform Feed · 2019-11-24T19:39:37.532Z · score: 1 (1 votes) · LW · GW

Roughly, A is simpler than B if all data that is consistent with A is a subset of all data that is consistent with B.

Maybe the less rough version is better, but this seems like a really bad formulation. Consider (a) an exact enumeration of every event that ever happened, making no prediction of the future, vs (b) the true laws of physics and the true initial conditions, correctly predicting every event that ever happened and every event that will happen.

Intuitively, (b) is simpler to specify, and we definitely want to assign (b) a higher prior probability. But according to this formulation, (a) is simpler, since all future events are consistent with (a), while almost none are consistent with (b). Since both theories have equally much evidence, we'd be forced to assign higher probability to (a).

Comment by lanrian on RAISE post-mortem · 2019-11-24T19:28:51.358Z · score: 14 (7 votes) · LW · GW

Good post!

Maybe this is too nitpicky, but "the most impactful years of your life will be 100x more impactful than the average" is necessarily false, because your career is so short that those years will increase the average. For example, if you have a 50-year career and all of your impact happens during a period of two years, your average yearly impact is 2/50=1/25 times as high as your impact during those two years. However, "the most impactful years of your life will be 100x more impactful than the median" could be true.

Comment by lanrian on Building Intuitions On Non-Empirical Arguments In Science · 2019-11-07T20:33:31.702Z · score: 8 (4 votes) · LW · GW

I feel like this is mostly a question of what you mean with "atlantis".

  • If you want to calculate P(evidence | the_specific_atlantis_that_newagers_specified_after_hearing_the_evidence) * P(the_specific_atlantis_that_newagers_specified_after_hearing_the_evidence), then the first term is going to be pretty high, and the second term would be very low (because it specifies a lot of things about what the atleantans did).
  • But if you want to calculate P(evidence | the_type_of_atlantis_that_people_mostly_associate_to_before_thinking_about_the_sphinx) * P(the_type_of_atlantis_that_people_mostly_associate_to_before_thinking_about_the_sphinx), the first term would be very low, while the second term would be somewhat higher.

The difference between the two cases is whether you think about the new agers as holding exactly one hypothesis and lying about what it predicts (as it cannot assign high probability to all of the things, since you're correct that the different probabilities must sum to 1), or whether you think about the new agers as switching to a new hypothesis every time they discover a new fact about the sphinx / every time they're asked a new question.

In this particular article, Scott mostly wants to make a point about cases where theories have similar P(E|T) but differ in the prior probabilities, so he focused on the first case.

Comment by lanrian on Normative reductionism · 2019-11-07T19:54:33.906Z · score: 4 (3 votes) · LW · GW

It’s harmless (but silly (note: I LIKE silly, in part because it’s silly to do so)) to have such preferences, and usually harmless to act on them.

I don't really understand why preferences about things that you can't observe are more silly than other preferences, but that's ok. I mostly wanted to clear up the terminology, and note that it seems more like common usage of 'preference' and 'utility' to say "That's a silly preference to have, because X, Y, Z" and "I think we should only care about things that can affect us" instead of saying "Your satisfaction of that preference has nothing to do with their confidence, it’s all about whether you actually find out" and "Without some perceptible difference, your utility cannot be different".

Comment by lanrian on Normative reductionism · 2019-11-07T17:52:23.940Z · score: -2 (2 votes) · LW · GW

Does it help if you don't think about a 'preference' as something ontologically fundamental, but just as a convenient shorthand for something that an agent is optimising for? It's certainly possible for an agent to optimise for something even if they'll never receive any evidence of if they succeeded. gjm gives a few examples in the sibling-comment to mine.

Comment by lanrian on Normative reductionism · 2019-11-06T20:00:11.808Z · score: 1 (1 votes) · LW · GW

I've personally used total consequentialism for this in the past (when arguing that non-causal decision theories + total consequentialism implies that we should assume that alien civilisations are common) and would support it being standard terminology. Many people know what total utilitarianism is, and making the switch for consequentialism is quite intuitive.

Comment by lanrian on Normative reductionism · 2019-11-06T19:30:53.058Z · score: 1 (1 votes) · LW · GW

Without some perceptible difference, your utility cannot be different.

This definition of "utility" (and your definition of "preference") is different from the one that most LWers use, different from the one that economists use, and different from the one that (at least some) professional philosophers use.

Ecomomists use it to define any preference ordering over worlds, and don't require it to be defined only over your own experiences. Some ethical theories in philosophy (e.g. hedonistic utilitarianism) define it as a direct function of your experiences, but others, (e.g. preference utilitarianism) define it as something that can be affected by things you don't know about. As evidence for the latter, this SEP page states:

If a person desires or prefers to have true friends and true accomplishments and not to be deluded, then hooking this person up to the experience machine need not maximize desire satisfaction. Utilitarians who adopt this theory of value can then claim that an agent morally ought to do an act if and only if that act maximizes desire satisfaction or preference fulfillment (that is, the degree to which the act achieves whatever is desired or preferred). What maximizes desire satisfaction or preference fulfillment need not maximize sensations of pleasure when what is desired or preferred is not a sensation of pleasure. This position is usually described as preference utilitarianism.

If you're a hedonistic utilitarian, feel free to argue for hedonistic utilitarianism, but do that directly instead of making claims about what other people are or aren't allowed to have preferences about.

Comment by lanrian on Normative reductionism · 2019-11-06T00:10:02.166Z · score: 8 (4 votes) · LW · GW

This SEP page defines:

Aggregative Consequentialism = which consequences are best is some function of the values of parts of those consequences (as opposed to rankings of whole worlds or sets of consequences).

Total Consequentialism = moral rightness depends only on the total net good in the consequences (as opposed to the average net good per person).

as two related and potentially identical concepts. When defining normative reductionism, do you mean that the value of the world is equal to the sum of the value of its parts? If so, total consequentialism is probably the closest term (though it's a bit unfortunate that they only contrast it with average utilitarianism).

Comment by lanrian on jacobjacob's Shortform Feed · 2019-10-30T22:50:02.880Z · score: 13 (8 votes) · LW · GW

(Potential reason for confusion: "don't endorse it" in habryka's first comment could be interpreted as not endorsing "this comment", when habryka actually said he didn't endorse his emotional reaction to the comment.)

Comment by lanrian on Why are people so bad at dating? · 2019-10-28T18:51:06.860Z · score: 71 (30 votes) · LW · GW

This seems like a misapplication of the concept of efficiency. The reason that a $20 bill on the ground is surprising is that a single competent agent would be enough to remove it from the world. Similarly, the reason that the efficient market hypothesis is a good approximation isn't that everyone who invests in the stock market is rational; instead, it's that a few highly informed individuals working full time are doing a great job at using up inefficiencies, which causes them to go away.

For every example that you pick, it's certainly true that some people are taking advantage of it (some people are using PhotoFeeler, some people have read Mate, etc), but there's no reason why this would translate into the advantages going away, or would automatically lead to everyone in the dating scene doing it. (Indeed, if someone is highly successful at dating, they're more likely to disappear from the dating scene than to stay in it.) Thus, it's highly disanalogous to efficient markets.

My main point is that humans are frequently unstrategic and bad, absent a lot of time investment and/or selection effects, so there's no particular reason to expect them to be great at dating. It may be true that they're even worse at dating than we would expect, but to draw that conclusion, the relevant comparisons are other things that lay people do in their spare time (ryan_b mentions job search, which seems like a good comparison), while theories assuming perfect rationality are unlikely to be useful.

(Another reason that humans are sometimes good at things is when they were highly useful for reproduction in the ancestral environment. While finding a mate was certainly useful, all of the mentioned examples concern things that have only become relevant during the past few hundred years, so it's not surprising that we're not optimised to make use of them.)

Comment by lanrian on The Dualist Predict-O-Matic ($100 prize) · 2019-10-24T10:44:28.253Z · score: 1 (1 votes) · LW · GW

SGD is not going to play the future forward to see the new feedback mechanism you’ve described and incorporate it into the loss function which is being minimized

My 'new feedback mechanism' is part of the training procedure. It's not going to be good at that by 'playing the future forward', it's going to become good at that by being trained on it.

I suspect we're using SGD in different ways, because everything we've talked about seems like it could be implemented with SGD. Do you agree that letting the Predict-O-Matic predict the future and rewarding it for being right, RL-style, would lead to it finding fixed points? Because you can definitely use SGD to do RL (first google result).

Comment by lanrian on The Dualist Predict-O-Matic ($100 prize) · 2019-10-23T09:45:17.223Z · score: 1 (1 votes) · LW · GW

Assuming that people don't think about the fact that Predict-O-Matic's predictions can affect reality (which seems like it might have been true early on in the story, although it's admittedly unlikely to be true for too long in the real world), they might decide to train it by letting it make predictions about the future (defining and backpropagating the loss once the future comes about). They might think that this is just like training on predefined data, but now the Predict-O-Matic can change the data that it's evaluated against, so there might be any number of 'correct' answers (rather than exactly 1). Although it's a blurry line, I'd say this makes it's output more action-like and less prediction-like, so you could say that it makes the training process a bit more RL-like.

Comment by lanrian on Sets and Functions · 2019-10-22T19:15:59.560Z · score: 1 (1 votes) · LW · GW

As someone with half an undergrads worth of math background, I've found these posts useful to grasp the purpose and some of the basics of category theory. It might be true that there's exist some exposition out there which would work better, but I haven't found/read that one, and I'm happy that this one exists (among other things, it has the not-to-be-underestimated virtue of being uneffortful to read). Looking forward to the Yoneda and adjunction posts!

Comment by lanrian on The Dualist Predict-O-Matic ($100 prize) · 2019-10-22T09:26:22.640Z · score: 1 (1 votes) · LW · GW
Yes, that sounds more like reinforcement learning. It is not the design I'm trying to point at in this post.

Ok, cool, that explains it. I guess the main differences between RL and online supervised learning is whether the model takes actions that can affect their environment or only makes predictions of fixed data; so it seems plausible that someone training the Predict-O-Matic like that would think they're doing supervised learning, while they're actually closer to RL.

That description sounds a lot like SGD. I think you'll need to be crisper for me to see what you're getting at.

No need, since we already found the point of disagreement. (But if you're curious, the difference is that sgd makes a change in the direction of the gradient, and this one wouldn't.)

Comment by lanrian on The Dualist Predict-O-Matic ($100 prize) · 2019-10-18T10:58:38.810Z · score: 1 (1 votes) · LW · GW

I think our disagreement comes from you imagining offline learning, while I'm imagining online learning. If we have a predefined set of (situation, outcome) pairs, then the Predict-O-Matic's predictions obviously can't affect the data that it's evaluated against (the outcome), so I agree that it'll end up pretty dualistic. But if we put a Predict-O-Matic in the real world, let it generate predictions, and then define the loss according to what happens afterwards, a non-dualistic Predict-O-Matic will be selected for over dualistic variants.

If you still disagree with that, what do you think would happen (in the limit of infinite training time) with an algorithm that just made a random change proportional to how wrong it was, at every training step? Thinking about SGD is a bit complicated, since it calculates the gradient while assuming that the data stays constant, but if we use online training on an algorithm that just tries things until something works, I'm pretty confident that it'd end up looking for fixed points.

Comment by lanrian on The Dualist Predict-O-Matic ($100 prize) · 2019-10-17T09:24:34.155Z · score: 5 (4 votes) · LW · GW
If dualism holds for Abram’s prediction AI, the “Predict-O-Matic”, its world model may happen to include this thing called the Predict-O-Matic which seems to make accurate predictions—but it’s not special in any way and isn’t being modeled any differently than anything else in the world. Again, I think this is a pretty reasonable guess for the Predict-O-Matic’s default behavior. I suspect other behavior would require special code which attempts to pinpoint the Predict-O-Matic in its own world model and give it special treatment (an “ego”).

I don't see why we should expect this. We're told that the Predict-O-Matic is being trained with something like sgd, and sgd doesn't really care about whether the model it's implementing is dualist or non-dualist; it just tries to find a model that generates a lot of reward. In particular, this seems wrong to me:

The Predict-O-Matic doesn't care about looking bad, and there's nothing contradictory about it predicting that it won't make the very prediction it makes, or something like that.

If the Predict-O-Matic has a model that makes bad prediction (i.e. looks bad), that model will be selected against. And if it accidentally stumbled upon a model that could correctly think about it's own behaviour in a non-dualist fashion, and find fixed points, that model would be selected for (since its predictions come true). So at least in the limit of search and exploration, we should expect sgd to end up with a model that finds fixed points, if we train it in a situation where its predictions affect the future.

If we only train it on data where it can't affect the data that it's evaluated against, and then freeze the model, I agree that it probably won't exhibit this kind of behaviour; is that the scenario that you're thinking about?

Comment by lanrian on Misconceptions about continuous takeoff · 2019-10-09T18:09:36.060Z · score: 9 (3 votes) · LW · GW
Possibly you'd want to rule out (c) with your stipulation that the tests are "robust"? But I'm not sure you can get tests that robust.

That sounds right. I was thinking about an infinitely robust misalignment-oracle to clarify my thinking, but I agree that we'll need to be very careful with any real-world-tests.

If I imagine writing code and using the misalignment-oracle on it, I think I mostly agree with Nate's point. If we have the code and compute to train a superhuman version of GPT-2, and the oracle tells us that any agent coming out from that training process is likely to be misaligned, we haven't learned much new, and it's not clear how to design a safe agent from there.

I imagine a misalignment-oracle to be more useful if we use it during the training process, though. Concretely, it seems like a misalignment-oracle would be extremely useful to achieve inner alignment in IDA: as soon as the AI becomes misaligned, we can either rewind the training process and figure out what we did wrong, or directly use the oracle as a training signal that severely punish any step that makes the agent misaligned. Coupled with the ability to iterate on designs, since we won't accidentally blow up the world on the way, I'd guess that something like this is more likely to work than . This idea is extremely sensitive to (c), though.

Comment by lanrian on Misconceptions about continuous takeoff · 2019-10-09T11:02:47.810Z · score: 4 (3 votes) · LW · GW
We might reach a state of knowledge when it is easy to create AIs that (i) misaligned (ii) superhuman and (iii) non-singular (i.e. a single such AI is not stronger than the sum total of humanity and aligned AIs) but hard/impossible to create aligned superhuman AIs.

My intuition is that it'd probably be pretty easy to create an aligned superhuman AI if we knew how to create non-singular, mis-aligned superhuman AIs, and had cheap, robust methods to tell if a particular AI was misaligned. However, it seems pretty plausible that we'll end up in a state where we know how to create non-singular, superhuman AIs; strongly suspect that most/all of them are mis-aligned; but don't have great methods to tell whether any particular AI is aligned or mis-aligned. Does that sound right to you?

Comment by lanrian on Misconceptions about continuous takeoff · 2019-10-09T09:06:03.596Z · score: 21 (9 votes) · LW · GW
Second, we could more-or-less deal with systems which defect as they arise. For instance, during deployment we could notice that some systems are optimizing something different than what we intended during training, and therefore we shut them down.
Each individual system won’t by themselves carry more power than the sum of projects before it. Instead, AIs will only be slightly better than the ones that came before it, including any AIs we are using to monitor the newer ones.

If the sum of projects from before carry more power than the individual system, such that it can't win by defection, there's no reason for it to defect. It might just join the ranks of "projects from before", and subtly try to alter future systems to be similarly defective, waiting for a future opportunity to strike. If the way we build these things systematically renders them misaligned, we'll sooner or later end up with a majority of them being misaligned, at which point we can't trivially use them to shut down defectors.

(I agree that continuous takeoff does give us more warning, because some systems will presumably defect early, especially weaker ones. And IDA is kind of similar to this strategy, and could plausibly work. I just wanted to point out that a naive implementation of this doesn't solve the problem of treacherous turns.)

Comment by lanrian on What do the baby eaters tell us about ethics? · 2019-10-07T13:03:32.664Z · score: 5 (3 votes) · LW · GW
In the sequence introduction Eliezer says it makes points about "naturalistic metaethics" but I wonder what points are these specifically, since after reading the SEP page on moral naturalism I can't really figure out what the mind-independent moral facts are in the story.

I wouldn't necessarily expect Eliezer's usage to be consistent with Stanford's entry. LW in general and Eliezer in particular are not great at using words from academic philosophy in the same way that philosophers do (see e.g. "utilitarianism").

Comment by lanrian on Against "System 1" and "System 2" (subagent sequence) · 2019-10-06T10:28:15.740Z · score: 8 (5 votes) · LW · GW
The book is saying that the left hemisphere answers incorrectly, in both cases! As I said, this is surprising.

That's not just surprising, that's absurd. I can absolutely believe the claim that the left hemisphere always takes what's written for granted, and solves the syllogism formally. But the claim here is that the left hemisphere pays careful attention to the questions, solves them correctly, and then reverses the answer. Why would it do that? No mechanism is proposed.

I looked at the one paper that's mentioned in the quote (Deglin and Kinsbourne), and they never ask the subjects whether the syllogisms are 'structurally correct'; they only ask about the truth. And their main conclusion is that the left hemisphere always solves syllogisms formally, not that it's always wrong.

If you've heard the bizarre stories about patients confabulating after strokes (eg "my limb isn't paralyzed, I just don't want to move it) this is almost unilaterally associated with damage to the right hemisphere.

Interesting, I didn't know this only happened with the left hemisphere intact.

Comment by lanrian on Against "System 1" and "System 2" (subagent sequence) · 2019-10-04T10:47:35.781Z · score: 8 (5 votes) · LW · GW

But in a different situation, where one is asked the different question "Is this syllogism structurally correct?", even when the conclusion flies in the face of one's experience, it is the right hemisphere which gets the answer correct, and the left hemisphere which is distracted by the familiarity of what it already thinks it knows, and gets the answer wrong.

Wait what? Surely the left/right hemispheres are accidentally reversed here? Or is the book saying that the left hemisphere always answers incorrectly, no matter what question you ask?

Comment by lanrian on How can I reframe my study motivation? · 2019-09-25T18:39:26.924Z · score: 1 (1 votes) · LW · GW

Similarly to TurnTrout's point about the sequences, learning logic/computation/ML is certainly relevant to and useful for AI safety, but there are things in Superintelligence which no computer science textbook will tell you. It's certainly valuable to pick the most useful resources within whatever field you're trying to study, but picking your field based on which one has the best textbooks seems misguided.

Also, textbooks typically require a great deal more effort than popular science books, so if OP is struggling with motivation for the latter, textbooks are likely to make things worse.

Comment by lanrian on Logical Optimizers · 2019-08-24T21:25:02.372Z · score: 1 (1 votes) · LW · GW

Ah, sorry, I misread the terminology. I agree.

Comment by lanrian on Soft takeoff can still lead to decisive strategic advantage · 2019-08-23T19:35:42.608Z · score: 6 (4 votes) · LW · GW

Hm, my prior is that speed of learning how stolen code works would scale along with general innovation speed, though I haven't thought about it a lot. On the one hand, learning the basics of how the code works would scale well with more automated testing, and a lot of finetuning could presumably be automated without intimate knowledge. On the other hand, we might be in a paradigm where AI tech allows us to generate lots of architectures to test, anyway, and the bottleneck is for engineers to develop an intuition for them, which seems like the thing that you're pointing at.

Comment by lanrian on Formalising decision theory is hard · 2019-08-23T18:47:14.845Z · score: 3 (2 votes) · LW · GW
In INP, any reinforcement learning (RL) algorithm will converge to one-boxing, simply because one-boxing gives it the money. This is despite RL naively looking like CDT.

Yup, like Caspar, I think that model-free RL learns the EDT policy in most/all situations. I'm not sure what you mean with it looking like CDT.

In Newcomb's paradox CDT succeeds but EDT fails. Let's consider an example where EDT succeeds and CDT fails: the XOR blackmail.

Isn't it the other way around? The one-boxer gets more money, but gives in to blackmail, and therefore gets blackmailed in the first place.

Comment by lanrian on Soft takeoff can still lead to decisive strategic advantage · 2019-08-23T18:29:41.145Z · score: 15 (6 votes) · LW · GW

Great post!

One thing I noticed is that claim 1 speak about nationstates while most of the AI-bits speak about companies/projects. I don't think this is a huge problem, but it seems worth looking into.

It seems true that it'll be necessary to localize the secret bits into single projects, in order to keep things secret. It also seems true that such projects could keep a lead on the order of months/years.

However, note that this does no longer correspond to having a country that's 30 years ahead of the rest of the world. Instead, it corresponds to having a country with a single company 30 years ahead of the world. The equivalent analogy is: could a company transported 30 years back in time gain a decisive strategic advantage for itself / whatever country it landed in?

A few arguments:

  • A single company might have been able to bring back a single military technology, which may or may not have been sufficient to turn the world, alone. However, I think one can argue that AI is more multipurpose than most technologies.
  • If the company wanted to cooperate with its country, there would be an implementation lag after the technology was shared. In old times, this would perhaps correspond to building the new ships/planes. Today, it might involve taking AI architectures and training them for particular purposes, which could be more or less easy depending on the generality of the tech. (Maybe also scaling up hardware?) During this time, it would be easier for other projects and countries to steal the technology (though of course, they would have implementation lags of their own).
  • In the historical case, one might worry that a modern airplane company couldn't produce much useful things 30 years back in time, because they relied on new materials and products from other companies. Translated to when AI-companies develops along with the world, this would highlight that the AI-company could develop a 30-year-lead-equivalent in AI-software, but that might not correspond to a 30-year-lead-equivalent in AI-technology, insofar as progress is largely driven by improvements to hardware or other public inputs to the process. (Unless the secret AI-project is also developing hardware.) I don't think this is very problematic: hardware progress seems to be slowing down, while software is speeding up (?), so if everything went faster things would probably be more software driven?
  • Perhaps one could also argue that a 3-year lead would translate to an even greater lead, because of recursive self-improvement, in which case the company would have an even greater lead over the rest of the world.

Overall, these points don't seem too important, and I think your claims still go through.

Comment by lanrian on Logical Optimizers · 2019-08-23T17:35:01.692Z · score: 4 (2 votes) · LW · GW

One problem that could cause the searching process to be unsafe is if the prior contained a relatively large measure of malign agents. This could happen if you used the universal prior, per Paul's argument. Such agents could maximize across the propositions you test them on, but do something else once they think they're deployed.

Comment by lanrian on Is there a standard discussion of vegetarianism/veganism? · 2019-08-10T06:11:52.837Z · score: 5 (5 votes) · LW · GW
There's still a stigma to being vegan, so people are less likely to want to be friends with you, and your networking skills will suffer.

Note that the opposite can also be true, especially if your plan to improve the world involves engaging with the animal rights community, or other people who care about animals.

Comment by lanrian on Can we really prevent all warming for less than 10B$ with the mostly side-effect free geoengineering technique of Marine Cloud Brightening? · 2019-08-09T04:58:13.123Z · score: 13 (10 votes) · LW · GW

I prefer to reserve "literally lying" for when people intentionally say things that are demonstrably false. It's useful to have words for that kind of thing. As long as things are plausibly defensible, it seems better to say that he made "misleading statements", or something like that.

Actually, I'm not even sure that this was a particularly egregious error. Given that they never say they're going to rank things after the explicit cost-effectiveness estimates, not doing that seems quite reasonable to me. See for example givewell's why we can't take expected value estimates literally. All the arguments in that article should be even stronger when it's different people making estimates across different areas. If you think that people should "make a guess" even when they don't have time to do more research, that's a methodological disagreement with a non-obvious answer.

I still think it's plausible that some of the economists were acting in bad faith (it's certainly bad that they don't even give qualititive justifications for some of their rankings). But when their actions are plausibly defensible in any particular instance, you need several different pieces of evidence to be confident of that (like where they get their funding from, if they're making systematic errors in the same direction, etc). If someone are saying things that I would classify as "literal lies", that's significantly stronger evidence that they're acting in bad faith, which means you can skip over some of that evidence-gathering. I thought that you were claiming that Lomborg had made such a statement, and the fact that he hadn't makes a large difference from my epistemical point of view, even if you have heard sufficiently much unrelated evidence to belive that he's systematically acting in bad faith.

Comment by lanrian on Can we really prevent all warming for less than 10B$ with the mostly side-effect free geoengineering technique of Marine Cloud Brightening? · 2019-08-08T18:36:51.672Z · score: 6 (6 votes) · LW · GW

He also literally lies in his cost-benefit analysis (by more than 10x).

What's the literal lie, here? The link seems to say that a group led by Lomborg made misleading statements about how they made their prioritisations, but I can't see any outright falsehoods.

Comment by lanrian on AI Safety Debate and Its Applications · 2019-07-29T15:04:42.100Z · score: 2 (2 votes) · LW · GW

What do you mean with picking pixels optimally? For very close to all images, I expect there to exist six pixels such that the judge identifies the correct label, if they are revealed. That doesn't seem like a meaningful metric, though.

Comment by Lanrian on [deleted post] 2019-07-21T20:50:20.308Z


In the three graphs below, we can see how the honest pl

Comment by lanrian on LessWrong FAQ · 2019-07-06T11:19:15.469Z · score: 1 (1 votes) · LW · GW

Also, I think the markdown syntax is wrong. For me, []() is just a link, while I get images if I do ![]()

Comment by lanrian on LessWrong FAQ · 2019-07-05T21:17:52.287Z · score: 6 (3 votes) · LW · GW

Cool thing that might or might not be worth mentioning in the "How do I insert images?"-section: If you select and copy an image from anywhere public, it will automatically work (note that it doesn't work if you right click and choose 'copy image'). This works for public google-docs, which is pretty useful for people who drafts their posts in google docs. It also works if you paste them into a comment.

Comment by lanrian on Recommendation Features on LessWrong · 2019-06-15T13:07:29.540Z · score: 5 (3 votes) · LW · GW

Re addictiveness: a potential fix could be to add an option to only refresh the recommended archive posts once per day (or some other time period of your choice).

Comment by lanrian on LessWrong FAQ · 2019-06-14T21:52:40.034Z · score: 8 (5 votes) · LW · GW

Thanks a lot for this Ruby! After skimming, the only thing I can think of adding would be a link to the moderation log, along with a short explanation of what it records. Partly because it's good that people can look at it, and partly because it's nice to inform people that their deletions and bans are publicly visible.

Comment by lanrian on Get Rich Real Slowly · 2019-06-10T18:39:20.948Z · score: 5 (3 votes) · LW · GW

LW footnotes can be created like EA forum footnotes.