Vaniver's Shortform 2019-10-06T19:34:49.931Z · score: 10 (1 votes)
Vaniver's View on Factored Cognition 2019-08-23T02:54:00.915Z · score: 41 (9 votes)
Conversation on forecasting with Vaniver and Ozzie Gooen 2019-07-30T11:16:58.633Z · score: 41 (10 votes)
Commentary On "The Abolition of Man" 2019-07-15T18:56:27.295Z · score: 65 (15 votes)
Is there a guide to 'Problems that are too fast to Google'? 2019-06-17T05:04:39.613Z · score: 49 (15 votes)
Welcome to LessWrong! 2019-06-14T19:42:26.128Z · score: 94 (48 votes)
Steelmanning Divination 2019-06-05T22:53:54.615Z · score: 142 (58 votes)
Public Positions and Private Guts 2018-10-11T19:38:25.567Z · score: 90 (28 votes)
Maps of Meaning: Abridged and Translated 2018-10-11T00:27:20.974Z · score: 54 (22 votes)
Compact vs. Wide Models 2018-07-16T04:09:10.075Z · score: 32 (13 votes)
Thoughts on AI Safety via Debate 2018-05-09T19:46:00.417Z · score: 88 (21 votes)
Turning 30 2018-05-08T05:37:45.001Z · score: 75 (24 votes)
My confusions with Paul's Agenda 2018-04-20T17:24:13.466Z · score: 90 (22 votes)
LW Migration Announcement 2018-03-22T02:18:19.892Z · score: 139 (37 votes)
LW Migration Announcement 2018-03-22T02:17:13.927Z · score: 2 (2 votes)
Leaving beta: Voting on moving to 2018-03-11T23:40:26.663Z · score: 6 (6 votes)
Leaving beta: Voting on moving to 2018-03-11T22:53:17.721Z · score: 139 (42 votes)
LW 2.0 Open Beta Live 2017-09-21T01:15:53.341Z · score: 23 (23 votes)
LW 2.0 Open Beta starts 9/20 2017-09-15T02:57:10.729Z · score: 24 (24 votes)
Pair Debug to Understand, not Fix 2017-06-21T23:25:40.480Z · score: 8 (8 votes)
Don't Shoot the Messenger 2017-04-19T22:14:45.585Z · score: 11 (11 votes)
The Quaker and the Parselmouth 2017-01-20T21:24:12.010Z · score: 6 (7 votes)
Announcement: Intelligence in Literature Prize 2017-01-04T20:07:50.745Z · score: 9 (9 votes)
Community needs, individual needs, and a model of adult development 2016-12-17T00:18:17.718Z · score: 12 (13 votes)
Contra Robinson on Schooling 2016-12-02T19:05:13.922Z · score: 4 (5 votes)
Downvotes temporarily disabled 2016-12-01T17:31:41.763Z · score: 17 (18 votes)
Articles in Main 2016-11-29T21:35:17.618Z · score: 3 (4 votes)
Linkposts now live! 2016-09-28T15:13:19.542Z · score: 27 (30 votes)
Yudkowsky's Guide to Writing Intelligent Characters 2016-09-28T14:36:48.583Z · score: 4 (5 votes)
Meetup : Welcome Scott Aaronson to Texas 2016-07-25T01:27:43.908Z · score: 1 (2 votes)
Happy Notice Your Surprise Day! 2016-04-01T13:02:33.530Z · score: 14 (15 votes)
Posting to Main currently disabled 2016-02-19T03:55:08.370Z · score: 22 (25 votes)
Upcoming LW Changes 2016-02-03T05:34:34.472Z · score: 46 (47 votes)
LessWrong 2.0 2015-12-09T18:59:37.232Z · score: 92 (96 votes)
Meetup : Austin, TX - Petrov Day Celebration 2015-09-15T00:36:13.593Z · score: 1 (2 votes)
Conceptual Specialization of Labor Enables Precision 2015-06-08T02:11:20.991Z · score: 10 (11 votes)
Rationality Quotes Thread May 2015 2015-05-01T14:31:04.391Z · score: 9 (10 votes)
Meetup : Austin, TX - Schelling Day 2015-04-13T14:19:21.680Z · score: 1 (2 votes)
Sapiens 2015-04-08T02:56:25.114Z · score: 42 (36 votes)
Thinking well 2015-04-01T22:03:41.634Z · score: 28 (29 votes)
Rationality Quotes Thread April 2015 2015-04-01T13:35:48.660Z · score: 7 (9 votes)
Meetup : Austin, TX - Quack's 2015-03-20T15:12:31.376Z · score: 1 (2 votes)
Rationality Quotes Thread March 2015 2015-03-02T23:38:48.068Z · score: 8 (8 votes)
Rationality Quotes Thread February 2015 2015-02-01T15:53:28.049Z · score: 6 (6 votes)
Control Theory Commentary 2015-01-22T05:31:03.698Z · score: 18 (18 votes)
Behavior: The Control of Perception 2015-01-21T01:21:58.801Z · score: 31 (31 votes)
An Introduction to Control Theory 2015-01-19T20:50:02.624Z · score: 35 (35 votes)
Estimate Effect Sizes 2014-03-27T16:56:35.113Z · score: 1 (2 votes)
[LINK] Will Eating Nuts Save Your Life? 2013-11-30T03:13:03.878Z · score: 7 (12 votes)
Understanding Simpson's Paradox 2013-09-18T19:07:56.653Z · score: 11 (11 votes)


Comment by vaniver on Ban the London Mulligan · 2019-11-14T23:55:11.625Z · score: 5 (2 votes) · LW · GW

Interestingly, this seems to pointing at the heart of the thing that I don't like about Magic, which I've long thought of as 'combos' (and sometimes as 'good cards'). That is, most of the Magic decks that I want to make are something like tribal creature decks, where I gradually accumulate Slivers or Knights or whatever (and, ideally, the power ramp looks quadratic instead of linear, but that's not essential). [I think Zvi, at one point, called this strategy of playing Magic the game of 'looking at pretty pictures'.]

Most of the Magic decks that get made by my friends who like Magic are closer to "here's a combination of three cards that means 'I win', and the deck is built around getting that combo out before the other person can get their combo out." Multiplayer Magic games (common among my friends) often have as a dominant strategy "sit there attempting to not look dangerous, until you can kill everyone in the same turn."

L5R had a card-by-card mulligan, which exacerbated this problem even more, I think, altho the replacement of lands with a fixed mana income (that could be saved across turns) did a lot to reduce the need for mulligans at all.

Comment by vaniver on Neural nets as a model for how humans make and understand visual art · 2019-11-12T23:36:23.367Z · score: 3 (1 votes) · LW · GW

The GAN's goal is to match these photos, not to match 3D scenes (which it doesn't know anything about).

I've see some results here where I thought the consensus interpretation was "angle as latent feature", such that there was an implied 3D scene in the latent space. (Most of what I'm seeing now with a brief scan has to do with facial rotations and pose invariance.) Maybe I should put scene is scare quotes, because it's generally not fully generic, as the sorts of faces and rooms you find in such a database are highly nonrandom / have a bunch of basic structure you can assume will be there.

Comment by vaniver on Neural nets as a model for how humans make and understand visual art · 2019-11-12T03:38:11.671Z · score: 15 (4 votes) · LW · GW

There's a long history of development of artistic concepts; for example, the discovery of perspective. There's also lots of artistic concepts where the dependence on the medium is highly significant, of which the first examples that come to mind are cave paintings (which were likely designed around being viewed by moving firelight) and 'highly distorted' ancient statuettes that are probably self-portraits.

So it seems to me like we should expect there to be a similar medium-relevance for NN-generated artwork, and a similar question of which artistic concepts it possesses and lacks. Perspective, for example, seems pretty well captured by GANs targeting the LSUN bedroom dataset or related tasks. It seems like it's not a surprise that NNs would be good at perspective compared to humans, since there's a cleaner inverse between the perceptive and the creation of perspective from the GAN's point of view than the human's (who has to use their hands to make it, rather than their inverted eyes).

That is, it seems that the similarities are actually pretty constrained to bits where the medium means humans and NN operate similarly, and the path of human art generation and the path of NN art generation look quite different in a way that suggests there are core differences. For example, I think most humans have pretty good facility with creating and understanding 'stick figures' that comes from training on a history of communicating with other humans using stick figures, rather than simply generalizing from visual image recognition, and we might be able to demonstrate the differences through a training / finetuning scheme on some NN that can do both classification and generation.

We might want to look for find concepts that are easier for humans than NNs; when I talk to people about ML-produced music, they often suggest that it's hard to capture the sort of dependencies that make for good music using current models (in the same way that current models have trouble making 'good art' that's more than style transfer or realistic faces or so on; it's unlikely that we could hook a NN up to a DeviantArt account and accept commissions and make money). But as someone who can listen to lots of Gandalf nodding to Jazz, and who thinks there are applications for things like CoverBot (which would do the acoustic equivalent of style transfer), my guess is that the near-term potential for ML-produced music is actually quite high.

Comment by vaniver on [Team Update] Why we spent Q3 optimizing for karma · 2019-11-08T17:39:12.697Z · score: 10 (5 votes) · LW · GW

I think the other thing A/B tests are good for is giving you a feedback source that isn't your design sense. Instead of "do I think this looks prettier?" you ask questions like "which do users click on more?". (And this eventually feeds back into your design sense, making it stronger.)

Comment by vaniver on [Team Update] Why we spent Q3 optimizing for karma · 2019-11-08T17:37:23.796Z · score: 12 (4 votes) · LW · GW

FWIW I don't have this effect (and am also at 3/10). But I think I was also always in the "if I like something at 50, I will upvote it anyway" camp instead of in the "I think this should have a karma of 40, and since it's at 50, I don't need to upvote it" camp.

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-11-06T17:28:33.727Z · score: 3 (1 votes) · LW · GW
My opinion is that geoengineering solutions lead to more fragility than reducing emissions and we would be better off avoiding them or at least doing something along the lines of carbon sequestration and not SRM.

Sure, I think carbon sequestration is a solid approach as well (especially given that it's still net energy-producing to burn fossil fuels and sequester the resulting output as CO2 somewhere underground!), and am not familiar enough with the numbers to know if SRM is better or worse than sequestration. My core objection was that Russell's opinion of the NAS meeting wasn't "SRM has expected disasters or expected high costs that disqualify it", and instead it looked like that the NAS thought it was more important to be adversarial to fossil fuel interests than make the best engineering decision.

Comment by vaniver on The Curse Of The Counterfactual · 2019-11-03T04:59:29.263Z · score: 5 (2 votes) · LW · GW

I didn't see the contradiction in that the goal state was unreachable; I saw the contradiction as "I will should myself to the place where I don't use shoulds anymore", as opposed to something like "I wish I were the sort of person who used wishes instead of shoulds". In the first case, you can't remove the 'last should' because it's structural to why you don't have other shoulds.

Comment by vaniver on Vaniver's Shortform · 2019-10-31T17:39:31.853Z · score: 15 (6 votes) · LW · GW

I've been thinking a lot about 'parallel economies' recently. One of the main differences between 'slow takeoff' and 'fast takeoff' predictions is whether AI is integrated into the 'human civilization' economy or constructing a separate 'AI civilization' economy. Maybe it's worth explaining a bit more what I mean by this: you can think of 'economies' as collections of agents who trade with each other. Often it will have a hierarchical structure, and where we draw the lines are sort of arbitrary. Imagine a person who works at a company and participates in its internal economy, and the company participates in national and global economies, and the person participates in those economies as well. A better picture has a very dense graph with lots of nodes and links between groups of nodes whose heaviness depends on the number of links between nodes in those groups.

As Adam Smith argues, the ability of an economy to support specialization of labor depends on its size. If you have an island with a single inhabitant, it doesn't make sense to fully employ a farmer (since a full-time farmer can generate much more food than a single person could eat), for a village with 100 inhabitants it doesn't make sense to farm more than would feed a hundred mouths, and so on. But as you make more and more of a product, investments that have a small multiplicative payoff become better and better, to the point that a planet with ten billion people will have massive investment in farming specialization that make it vastly more efficient per unit than the village farming system. So for much of history, increased wealth has been driven by this increased specialization of labor, which was driven by the increased size of the economy (both through population growth and decreased trade barriers widening the links between economies until they effectively became one economy).

One reason to think economies will remain integrated is because increased size benefits all actors in the economy on net; another is that some of the critical links will be human-human links, or that human-AI links will be larger than AI-AI links. But if AI-AI links have much lower friction cost, then it will be the case that the economy formed just of AI-AI links can 'separate' from the total civilizational economy, much in the way that the global economy could fragment through increased trade barriers or political destabilization (as has happened many times historically, sometimes catastrophically). More simply, it could be the case that all the interesting things are happening in the AI-only economy, even if it's on paper linked to the human economy. Here, one of the jobs of AI alignment could be seen as making sure that either there's continuity of value between the human-human economy and the AI-AI economy, or ensuring that the human-AI links remain robust so that humans are always relevant economic actors.

Comment by vaniver on On Internal Family Systems and multi-agent minds: a reply to PJ Eby · 2019-10-31T01:01:26.707Z · score: 24 (4 votes) · LW · GW
Could you say what exactly is the position that you are arguing for, in this conversation?

My inner pjeby is arguing for something like this:

"IFS is a specific model that has some similarities and connections to the best available view, but also clear failures that the best available view doesn't have. But giving IFS this much airtime only makes sense to the reader if you think IFS is the best available view! So you should either justify the implicit respect you're giving to IFS, or explicitly acknowledge the source of that respect (like by pointing out that IFS isn't the best available view, but you like it)."

Consider also my comment elsewhere; my sense is that the post is ambiguating between "IFS is of historical interest" and "IFS is how someone should begin to understand this topic" in a way that misses pjeby's meta-level criticisms of the message sent by giving IFS so much attention / not directly agreeing with criticisms. I came away thinking that you think pjeby is right but some part of IFS is worth salvaging, and when I put my pjeby hat on, I can't figure out which part you think is worth salvaging. There are some hypotheses, like that you have gratitude towards IFS, or you think enough rationalists are familiar with IFS that presenting new models like UtEB as diffs from IFS can help them understand both better, or the label for 'theory of modeling psychology' in your mind is 'IFS' instead of something broader, but the actual motivation is unclear from the post.

Comment by vaniver on Vaniver's Shortform · 2019-10-29T23:13:09.848Z · score: 14 (4 votes) · LW · GW

One challenge for theories of embedded agency over Cartesian theories is that the 'true dynamics' of optimization (where a function defined over a space points to a single global maximum, possibly achieved by multiple inputs) are replaced by the 'approximate dynamics'. But this means that by default we get the hassles associated with numerical approximations, like when integrating differential equations. If you tell me that you're doing Euler's Method on a particular system, I need to know lots about the system and about the particular hyperparameters you're using to know how well you'll approximate the true solution. This is the toy version of trying to figure out how a human reasons through a complicated cognitive task; you would need to know lots of details about the 'hyperparameters' of their process to replicate their final result.

This makes getting guarantees hard. We might be able to establish what the 'sensible' solution range for a problem is, but establishing what algorithms can generate sensible solutions under what parameter settings seems much harder. Imagine trying to express what the set of deep neural network parameters are that will perform acceptably well on a particular task (first for a particular architecture, and then across all architectures!).

Comment by vaniver on A Critique of Functional Decision Theory · 2019-10-29T23:08:44.769Z · score: 5 (2 votes) · LW · GW
If the problem statement is as described and the FDT agent sees "you'll take the right box" and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.

See also Nate Soares in Decisions are for making bad outcomes inconsistent. This is sort of a generalization, where 'decisions are for making bad outcomes unlikely.'

Comment by vaniver on On Internal Family Systems and multi-agent minds: a reply to PJ Eby · 2019-10-29T22:23:09.598Z · score: 3 (1 votes) · LW · GW
I'm still not sure what it would mean for humans to actually have subagents, versus to just behave exactly as if they have subagents. I don't know what empirical finding would distinguish between those two theories.

I agree that you shouldn't expect to see any findings that distinguish between those theories. But I thought the main question here was closer to "do humans behave as if they have subagents?", where there's evidence that points in that direction ("I have conflicting desires and moods!") and evidence that points away from that direction ("Anger isn't an agent, if it were you would see Y!").

Comment by vaniver on Ms. Blue, meet Mr. Green · 2019-10-29T22:07:34.109Z · score: 6 (2 votes) · LW · GW

To be clear, I agree with this, which is a reason why I generally try to give the involved explanation that bridges to where (I think) the other person is; in my experience the opportunity rarely evaporates, and even if it does a straightforward refusal like you mention seems more promising.

Comment by vaniver on Two explanations for variation in human abilities · 2019-10-29T21:54:24.068Z · score: 12 (6 votes) · LW · GW

Probably this post; this claim has been highly controversial, with the original blog post citing a 2006 paper that was retracted in 2014, and whose original author wrote a meta-analysis that supported their conclusions in 2009. Here's some previous discussion (in 2012) on LW. Many people have comments to the effect of "bimodal scores are common in education" with relatively few people having citations to back that up, in a way that makes me suspect they're drawing from the original retracted paper.

Comment by vaniver on Ms. Blue, meet Mr. Green · 2019-10-29T16:40:59.356Z · score: 8 (3 votes) · LW · GW
The point, if you like, is that if you’re asked to explain some “woo” or “mysticism” or whatever, and you find yourself sounding like Morpheus sounds in the movie, you’re doing it wrong.

One thing that's interesting about explanations and tutoring is that often, when a student asks a question, two things come up: the answer to the question as they asked it and the question they should have asked instead. On StackOverflow, this gets referred to as the XY problem. The general advice there lines up with yours--both answer the question they asked, and point at how they could know whether or not it's the right question--but it's not obvious to me that also applies for non-technical contexts, where these moments of frustration with one's model or approach might be the primary opportunities for making progress on the Y problem. If that opportunity would evaporate when you presented a solution to X, then Morpheus's strategy seems better.

Comment by vaniver on Vaniver's Shortform · 2019-10-29T16:27:43.159Z · score: 4 (2 votes) · LW · GW

That's it, thanks!

Comment by vaniver on On Internal Family Systems and multi-agent minds: a reply to PJ Eby · 2019-10-29T16:26:51.145Z · score: 11 (4 votes) · LW · GW
In any case, what I was trying to say was that even if IFS isn’t great reductionism, it’s still a better model than a naïve conception of the mind as a unified whole.

I think there's an important difference between something like "I got insights out of IFS" and "I think people should be taught IFS," and it's not obvious to me whether you're trying to say "historically rationalists got into IFS basically by chance amplified by word-of-mouth recommendations" or "I think rationalists should start learning about this sort of stuff through IFS" or "I think people who got into this through IFS shouldn't abandon it."

This is clarified some by this:

The intention of my comment was descriptive rather than prescriptive; that historically, the IFS model has been popular because it’s pragmatically useful and because, despite its possible flaws, rationalists haven’t been exposed to any better models.

The sense I have from reading through the post is something like "IFS and other approaches seem to be converging, in a way that means differences between them are more subtle / more tied to the overall narrative rather than the active ingredient." If someone is better off thinking of things as parts that have wants, then they should get the IFS-derived version; if not, then not. But this sort of reminds me of my long-standing attempt to figure out what Friston-style models are doing that PCT-style models aren't (and vice versa), where actually the thing at play seems to be a question of "which underlying theory deserves more status?" than "what different predictions do these models make?", in a way that's quite difficult to resolve.

Comment by vaniver on Vaniver's Shortform · 2019-10-28T16:44:36.642Z · score: 9 (4 votes) · LW · GW

I came across some online writing years ago, in which someone considers the problem of a doctor with a superpower, that they can instantly cure anyone they touch. They then talk about how the various genres of fiction would handle this story and what they would think the central problem would be.

Then the author says "you should try to figure out how you would actually solve this problem." [EDIT: I originally had his solution here, but it's a spoiler for anyone who wants to solve it themselves; click rsaarelm's link below to see it in its original form.]

I can't easily find it through Google, but does anyone know what I read / have the link to it?

Comment by vaniver on What are some unpopular (non-normative) opinions that you hold? · 2019-10-23T17:40:41.004Z · score: 4 (2 votes) · LW · GW

I had seen claims that Pete Buttigieg had made calls for mandatory national service, but turns out it was actually a substantial increase in the number of paid service opportunities.

Comment by vaniver on Why Ranked Choice Voting Isn't Great · 2019-10-23T15:42:36.489Z · score: 5 (2 votes) · LW · GW
I would expect that it wouldn't take many competitive elections before most people would be submitting votes that are as strong as possible.

That seems right; the surprising thing is that people are willing to 'undervote' at all, and that they do so deliberately instead of through ignorance. But it makes sense, especially for downballet candidates where you vaguely suspect parks commissioner candidate A is better than B but don't want to put your full force behind A (because if other people are well-informed on the parks commissioner race, you want them to settle it).

Beyond ignorance, altruism sometimes motivates this, which is easiest to see for simple things like picking what restaurant to go to. If you're mostly indifferent between them but have a weak preference, it makes sense to do a compressed vote if you suspect other people might have strong preferences that you don't want to overwhelm.

Comment by vaniver on Why Ranked Choice Voting Isn't Great · 2019-10-22T01:37:22.349Z · score: 4 (2 votes) · LW · GW

I basically don't buy the Condorcet winner argument, mostly because the utility and disutility of winning or losing isn't fixed. This is one of the reasons why I like score voting (or range voting) so much; candidates who are massively disliked lose heavily, whereas candidates who are broadly liked win, and from the candidate's point for view, increasing your score in the eyes of anyone is useful, regardless of their score for other candidates.

Yes, there are concerns about comparing utilities across people, but people tend to be pretty reasonable about this in the score voting framework. (It's strategic to give your favorite a 10, and your least favorite a 0, but empirically people often compress their scores much more, say giving everyone a rating between 6 and 8, which implicitly makes their vote a fifth as strong.) The main problem is when you add candidates who make differences so large that it dwarfs all other variation (at least among voters who think they have tiered preferences). That is, suppose you have an "Anyone But Trump" voter; their vote that maximizes the chance of someone besides Trump winning is to give Trump a 0 and every other candidate a 10. But now whether Clinton or Kasich wins depends mostly on the people who thought Trump was ok (or it was worth putting Kasich in the "as bad as Trump" camp). This is probably fine for the rare election where there's a surprise Trump, and is not great if there's always a Trump-like candidate running.

Comment by vaniver on Why Ranked Choice Voting Isn't Great · 2019-10-22T01:20:51.983Z · score: 5 (2 votes) · LW · GW
Why are you classifying Approval Voting as cardinal?

Ordinal voting systems force you to express a strict preference between all candidates; cardinal voting systems allow you to have candidates tie. For any election with at least three candidates, there has to be a tie if you're using approval voting.

Comment by vaniver on Inefficient Doesn’t Mean Indifferent · 2019-10-18T18:17:25.639Z · score: 7 (3 votes) · LW · GW

For a long time, this was my impression as well, but Caplan claims the evidence doesn't bear this out. And many organizations do use IQ testing successfully; the military is a prime example.

Comment by vaniver on Vaniver's Shortform · 2019-10-18T00:04:52.501Z · score: 25 (9 votes) · LW · GW

[Meta: this is normally something I would post on my tumblr, but instead am putting on LW as an experiment.] Sometimes, in games like Dungeons and Dragons, there will be multiple races of sapient beings, with humans as a sort of baseline. Elves are often extremely long-lived, but most handlings of this I find pretty unsatisfying. Here's a new take, that I don't think I've seen before (except the Ell in Worth the Candle have some mild similarities): Humans go through puberty at about 15 and become adults around 20, lose fertility (at least among women) at about 40, and then become frail at about 60. Elves still 'become adults' around 20, in that a 21-year old elf adventurer is as plausible as a 21-year old human adventurer, but they go through puberty at about 40 (and lose fertility at about 60-70), and then become frail at about 120.

This has a few effects:

  • The peak skill of elven civilization is much higher than the peak skill of human civilization (as a 60-year old master carpenter has had only ~5 decades of skill growth, whereas a 120-year old master carpenter has had ~11). There's also much more of an 'apprenticeship' phase in elven civilization (compare modern academic society's "you aren't fully in the labor force until ~25" to a few centuries ago, when it would have happened at 15), aided by them spending longer in the "only interested in acquiring skills" part of 'childhood' before getting to the 'interested in sexual market dynamics' part of childhood.
  • Young elves and old elves are distinct in some of the ways human children and adults are distinct, but not others; the 40-year old elf who hasn't started puberty yet has had time to learn 3 different professions and build a stable independence, whereas the 12-year old human who hasn't started puberty yet is just starting to operate as an independent entity. And so sometimes when they go through puberty, they're mature and stable enough to 'just shrug it off' in a way that's rare for humans. (I mean, they'd still start growing a beard / etc., but they might stick to carpentry instead of this romance bullshit.)
  • This gives elven society something of a huge individualist streak, in that people focused a lot on themselves / the natural world / whatever for decades before getting the kick in the pants that convinced them other elves were fascinating too, and so they bring that additional context to whatever relationships they do build.
  • For the typical human, most elves they come into contact with are wandering young elves, who are actually deeply undifferentiated (sometimes in settings / games you get jokes about how male elves are basically women, but here male elves and female elves are basically undistinguished from each other; sure, they have primary sex characteristics, but in this setting a 30-year old female elf still hasn't grown breasts), and asexual in the way that children are. (And, if they do get into a deep friendship with a human for whom it has a romantic dimension, there's the awkward realization that they might eventually reciprocate the feelings--after a substantial fraction of the human's life has gone by!)
  • The time period that elves spend as parents of young children is about the same as the amount of time that humans spend, but feels much shorter, and still elves normally only see their grandchildren and maybe briefly their great-grandchildren.

This gives you three plausible archetypes for elven adventurers:

  • The 20-year old professional adventurer who's just starting their career (and has whatever motivation).
  • The 45-year old drifter who is still level 1 (because of laziness / lack of focus) who is going through puberty and needs to get rich quick in order to have any chance at finding a partner, and so has turned to adventuring out of desperation.
  • The established 60-year old who has several useless professions under their belt (say, a baker and an accountant and a fisherman) who is now taking up adventuring as career #4 or whatever.
Comment by vaniver on Thoughts on "Human-Compatible" · 2019-10-11T16:42:27.607Z · score: 3 (1 votes) · LW · GW

Oh! Sorry, I missed the "How does this compare with" line.

Comment by vaniver on Thoughts on "Human-Compatible" · 2019-10-10T23:48:54.444Z · score: 7 (3 votes) · LW · GW
yes, but its underlying model is still accurate, even if it doesn't reveal that to us?

This depends on whether it thinks we would approve more of it having an accurate model and deceiving us or having an inaccurate model in the way we want its model to be less accurate. Some algorithmic bias work is of the form "the system shouldn't take in inputs X, or draw conclusions Y, because that violates a deontological rule, and simple accuracy-maximization doesn't incentivize following that rule."

My point is something like "the genius of approval-directed agency is that it grounds out every meta-level in 'approval,' but this is also (potentially) the drawback of approval-directed agency." Specifically, for any potentially good property the system might have (like epistemic accuracy) you need to check whether that actually in-all-cases for-all-users maximizes approval, because if it doesn't, then the approval-directed agent is incentivized to not have that property.

[The deeper philosophical question here is something like "does ethics backchain or forwardchain?", as we're either grounding things out in what will believe or what we believe now, and approval-direction is more the latter, and CEV-like things are more the former.]

Comment by vaniver on Thoughts on "Human-Compatible" · 2019-10-10T22:48:27.778Z · score: 14 (4 votes) · LW · GW
But, having good predictive accuracy is instrumentally useful for maximizing the reward signal, so we can expect that its implicit representation of the world continually improves (i.e., it comes to find a nice efficient encoding). We don't have to worry about this - the AI is incentivized to get this right.

The AI is incentivized to get this right only in directions that increase approval. If the AI discovers something the human operator would disapprove of learning, it is incentivized to obscure that fact or act as if it didn't know it. (This works both for "oh, here's an easy way to kill all humans" and "oh, it turns out God isn't real.")

Comment by vaniver on Replace judges with Keynesian beauty contests? · 2019-10-07T17:13:54.734Z · score: 4 (2 votes) · LW · GW

Note that 'guilt' and 'innocence' is normally settled by a jury (for serious cases), and that most (interesting) judicial decisions are on cases that don't have a binary outcome, and the reasoning by which they make the decision is an important part of the precedent set by their decision. It seems like this method can still work for that, but this exacerbates concerns that things will be 'decided by the lowest common denominator' instead of whatever the 'legal truth' should be.

Comment by vaniver on Vaniver's Shortform · 2019-10-06T19:34:50.088Z · score: 22 (7 votes) · LW · GW

People's stated moral beliefs are often gradient estimates instead of object-level point estimates. This makes sense if arguments from those beliefs are pulls on the group epistemology, and not if those beliefs are guides for individual action. Saying "humans are a blight on the planet" would mean something closer to "we should be more environmentalist on the margin" instead of "all things considered, humans should be removed."

You can probably imagine how this can be disorienting, and how there's a meta issue of the point estimate view is able to see what it's doing in a way that the gradient view might not be able to see what it's doing.

Comment by vaniver on Entangled Truths, Contagious Lies · 2019-10-06T19:27:44.729Z · score: 11 (6 votes) · LW · GW

Suppose I have two cards, A and B, that I shuffle and then blindly place in two spaceships, pointed at opposite ends of the galaxy. If they go quickly enough, it can be the case that they get far enough apart that they will never be able to meet again. But if you're in one of the spaceships, and turn the card over to learn that it's card A, then you learn something about the world on the other side of the light cone boundary.

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-05T02:45:11.941Z · score: 5 (2 votes) · LW · GW

That's how I interpreted:

the defensive AI systems designed to protect against rogue AI systems are not akin to the military, they are akin to the police, to law enforcement. Their "jurisdiction" would be strictly AI systems, not humans.

To be clear, I think he would mean it more in the way that there's currently an international police order that is moderately difficult to circumvent, and that the same would be true for AGI, and not necessarily the more intense variants of stabilization (which are necessarily primarily if you think offense is highly advantaged over defense, which I don't know his opinion on).

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T21:24:46.421Z · score: 24 (10 votes) · LW · GW

It seems like this is the sort of deep divide that is hard to cross, since I would expect people to have strong opinions based on what they've seen work elsewhere. It has an echo of the previous concern, where Russell needs to somehow point out "look, this time it actually is important to have a theory instead of doing things ad-hoc" in a way that depends on the features of this particular issue rather than the way he likes doing work.

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T21:11:53.627Z · score: 2 (3 votes) · LW · GW

I think 5 is much closer to the "look, the first goal is to build a system that prevents anyone else from building unaligned AGI" claim, and there's a separate claim 6 of the form "more generally, we can use AGI to police AGI" that is similar to debate or IDA. And I think claim 5 is basically in line with what, say, Bostrom would discuss (where stabilization is a thing to do before we attempt to build a sovereign).

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T21:09:24.303Z · score: 7 (3 votes) · LW · GW

I think this scheme doesn't quite catch the abulia trap (where the AGI discovers a way to directly administer itself reward, and then ceases to interact with the outside world), in that it's not clear that the AI learns about the map/territory distinction and to locate its goals in the territory (one way to avoid this) instead of just a prohibition against many sorts of self-modification or reward tampering (which avoids this until it comes up with a clever new approach).

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T16:29:34.698Z · score: 18 (11 votes) · LW · GW

[Context: the parent comment was originally posted to the Alignment Forum, and was moved to only be visible on LW.]

One of my hopes for the Alignment Forum, and to a much lesser extent LessWrong, is that we manage to be a place where everyone relevant to AI alignment gets value from discussing their work. There's many obstacles to that, but one of the ones that I've been thinking a lot recently is that pointing at foundational obstacles can look a lot like low-effort criticism.

That is, I think there's a valid objection here of the form "these people are using reasoning style A, but I think this problem calls for reasoning style B because of considerations C, D, and E." But the inferential distance here is actually quite long, and it's much easier to point out "I am not convinced by this because of <quick pointer>" than it is to actually get the other person to agree that they were making a mistake. And beyond that, there's the version that scores points off an ingroup/outgroup divide and a different version that tries to convert the other party.

My sense is that lots of technical AI safety agendas look to each other like they have foundational obstacles, of the sort that means having more than one agenda happy at the Alignment Forum means everyone needs to not do this sort of sniping, while still having high-effort places to discuss those obstacles. (That is, if we think CIRL can't handle corrigibility, having a place for 'obstacles to CIRL' where that's discussed makes sense, but bringing it up at every post on CIRL might not.)

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T16:16:43.286Z · score: 58 (27 votes) · LW · GW

There's a dynamic that's a normal part of cognitive specialization of labor, where the work other people are doing is "just X"; imagine trying to create a newspaper, for example. Most people will think of writing articles as "just journalism"; you pay journalists whatever salary, they do whatever work, and you get articles for your newspaper. Similarly the accounting is "just accounting," and so on. But the journalist can't see journalism as "just journalism"; if their model of how to write articles is "money goes in, article comes out" they won't be able to write any articles. Instead they have lots of details about how to write articles, which includes what articles are and aren't easy.

You could view both sides as doing something like this: the person who's trying to make safeguards is saying "look, you can't say 'just add safeguards', these things are really difficult" and the person who's trying to make something worth safeguarding is saying "look, you can't just 'just build an autonomous superintelligence', these things are really difficult." (Especially since I think LeCun views them as too difficult to try to do, and instead is just trying to get some subcomponents.)

I think that's part of what's going on, but mostly in how it seems to obscure the core issue (according to me), which is related to Yoshua's last point: "what safeguards we need when" is part of the safeguard science that we haven't done yet. I think we're in a situation where many people say "yes, we'll need safeguards, but it'll be easy to notice when we need them and implement them when we notice" and the people trying to build those safeguards respond with "we don't think either of those things will be easy." But notice how, in the backdrop of "everyone thinks their job is hard," this statement provides very little ability to distinguish between worlds where this actually is a crisis and worlds where things will be fine!

Comment by vaniver on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T04:58:58.377Z · score: 24 (10 votes) · LW · GW
I just attended an NAS meeting on climate control systems, where the consensus was that it was too dangerous to develop, say, solar radiation management systems - not because they might produce unexpected disastrous effects but because the fossil fuel corporations would use their existence as a further form of leverage in their so-far successful campaign to keep burning more carbon.

Unrelated to the primary point, but how does this make sense? If geoengineering approaches successfully counteract climate change, and it's cheaper to burn carbon and dim the sun than generate power a different way (or not use the power), then presumably civilization is better off burning carbon and dimming the sun.

It looks to me the argument is closer to "because the fossil fuel corporations are acting adversarially to us, we need to act adversarially to them," or expecting that instead of having sensible engineering or economic tradeoffs, we'll choose 'burn carbon and dim the sun' even if it's more expensive than other options, because we can't coordinate on putting the costs in the right place.

Which... maybe I buy, but this looks to me like net-negative environmentalism again (like anti-nuclear environmentalism).

Comment by vaniver on Connectome-specific harmonic waves and meditation · 2019-09-30T22:35:46.455Z · score: 11 (6 votes) · LW · GW

I talked with Mike Johnson a bunch about this at a recent SSC meetup, and think that CSHW are a cool way to look at brain activity but that associating them directly with valence of experience (the simple claim "harmonic CSHW ≡ good") has a bunch of empirical consequences that seem probably false to me. (This is a good thing, in many respects, because it points at a series of experiments that might convince one of us!)

An observation is that I think this is a 'high level' or 'medium level' description of what's going on in the brain, in a way that makes it sort of difficult to buy as a target. If I think about meditation as something like having one thing on the stack, or as examining your code to refactor it, or directing your attention at itself, then I can see what's going on in a somewhat clear way. And it's easy to see how having one thing on the stack might increase the harmony (defined as a statistical property of a distribution of energies in the CSHW), but the idea that the goal was to increase the harmony and having one thing on the stack just happens to do so seems unsupported.

I do like that this has an answer for 'dark room' objections that seems superior to the normal 'priors' story for Friston-style approaches, in that you're trying to maximize a property (tho you still smuggle in the goals through the arrangement of the connectome, but that's fine because they had to come from somewhere).

Meditation, and anything that sets up harmonic neuronal oscillation, makes brain activity more symmetric, hence better or good.

I think this leap is bigger than it might seem, because it's not clear that you have control loops on the statistical properties of your brain as a whole. It reads something like a type error that's equivocating between individual loops and the properties of many loops.

Now, it may turn out that 'simplicity' is the right story here, where harmony / error-minimization / etc. are just very simple things to build and so basically every level of the brain operates on that sort of principle. In a draft of the previous paragraph I had a line that said "well, but it's not obvious that there's a control loop operating on the control loops that has this sort of harmony as an observation" but then I thought "well, you could imagine this is basically what consciousness / the attentional system is doing, or that this is true for boring physical reasons where the loops are all swimming in the same soup and prefer synchronization."

But this is where we need to flesh out some implementation details and see if it makes the right sorts of predictions. In particular, I think a 'multiple drives' model makes sense, and lines up easily with the hierarchical control story, but I didn't see a simple way that it also lines up with the harmony story. (In particular, I think lots of internal conflicts make sense as two drives fighting over the same steering wheel, but a 'maximize harmony' story needs to have really strong boundary conditions to create the same sorts of conflicts. Now, really strong boundary conditions is pretty sensible, but still makes it sort of weird as a theory of long-term arrangement, because you should expect the influence of the boundary conditions to be something the long-term arrangement can adjust.)

Comment by vaniver on Long-term Donation Bunching? · 2019-09-28T03:09:28.679Z · score: 4 (2 votes) · LW · GW

This 'works' except for the fact that any sort of enforceable contract (that, in year 6, it will eventually get around to you) will mean they are no longer gifts (and thus aren't considered personal gifts underneath the relevant threshold). But even if it doesn't get around to you, this is an improvement over not having anything to deduct yourself.

Comment by vaniver on Long-term Donation Bunching? · 2019-09-27T16:08:09.745Z · score: 4 (4 votes) · LW · GW

It's also inconvenient for charities to have variable income streams instead of dependable donors (altho this is a risk you'll be facing anyway if someone is frequently re-evaluating donation targets), but you can work around this by having a donor-advised fund. Donate to the fund in year 1, collect the deduction, and this disperse money from the fund at a constant pace each year (until you hit year N, at which point you donate again).

Comment by vaniver on The AI Timelines Scam · 2019-09-17T22:14:14.206Z · score: 14 (3 votes) · LW · GW

I haven't had time to write my thoughts on when strategy research should and shouldn't be public, but I note that this recent post by Spiracular touches on many of the points that I would touch on in talking about the pros and cons of secrecy around infohazards.

The main claim that I would make about extending this to strategy is that strategy implies details. If I have a strategy that emphasizes that we need to be careful around biosecurity, that implies technical facts about the relative risks of biology and other sciences.

For example, the US developed the Space Shuttle with a justification that didn't add up (ostensibly it would save money, but it was obvious that it wouldn't). The Soviets, trusting in the rationality of the US government, inferred that there must be some secret application for which the Space Shuttle was useful, and so developed a clone (so that when the secret application was unveiled, they would be able to deploy it immediately instead of having to build their own shuttle from scratch then). If in fact an application like that had existed, it seems likely that the Soviets could have found it by reasoning through "what do they know that I don't?" when they might not have found it by reasoning from scratch.

Comment by vaniver on Meetups as Institutions for Intellectual Progress · 2019-09-17T16:46:08.793Z · score: 13 (8 votes) · LW · GW
I’d potentially like to generate a default meetup topic for each month, so that lots of people all over are talking about the same thing at the same time. This could make discussion on both LessWrong (via the write-ups) and the Facebook group more focused, and thus hopefully move the community's conversation around those topics forward (trying to avoid the failure mode of retreading the same ground).

I predict that this will be better the closer this is to "deeply consume content X" than "generate content X". I also suspect something like... 'retreading the same ground' is actually good, in the context of meetups, so long that it's on a sufficiently long schedule that there's something new to come back to, and the content is 'evergreen' in the right way. (Specifically, meetups regularly have member turnover, such that you can't all read the Sequences together in 2014 and then everyone knows the Sequences.)

It also might be neat if we could generate meetup-exclusive content, such that actually showing up in person gives you some access that you can't get otherwise, adding some scarcity to (actually!) increase the value. But where that content will come from is, of course, an obstacle.

Comment by vaniver on A Critique of Functional Decision Theory · 2019-09-14T00:28:59.268Z · score: 9 (4 votes) · LW · GW
In particular, the old name for this, 'updatelessness', threw me for a loop for a while because it sounded like the dumb "don't take input from your environment" instead of the conscious "consider what impact you're having on hypothetical versions of yourself".

As a further example, consider glomarization. If I haven't committed a crime, pleading the fifth is worse than pleading innocence; however it means that when I have committed a crime, I have to either pay the costs of pleading guilty, pay the costs of lying, or plead the fifth (which will code to "I'm guilty", because I never say it when I'm innocent). If I care about honesty and being difficult to distinguish from the versions of myself who commit crimes, then I want to glomarize even before I commit any crimes.

Comment by vaniver on G Gordon Worley III's Shortform · 2019-09-14T00:15:34.972Z · score: 23 (8 votes) · LW · GW

There's a dynamic here that I think is somewhat important: socially recognized gnosis.

That is, contemporary American society views doctors as knowing things that laypeople don't know, and views physicists as knowing things that laypeople don't know, and so on. Suppose a doctor examines a person and says "ah, they have condition X," and Amy responds with "why do you say that?", and the doctor responds with "sorry, I don't think I can generate a short enough explanation that is understandable to you." It seems like the doctor's response to Amy is 'socially justified', in that the doctor won't really lose points for referring to a pre-existing distinction between those-in-the-know and laypeople (except maybe for doing it rudely or gracelessly). There's an important sense in which society understands that it in fact takes many years of focused study to become a physicist, and physicists should not be constrained by 'immediate public justification' or something similar.

But then there's a social question, of how to grant that status. One might imagine that we want astronomers to be able to do their astronomy and have their unintelligibility be respected, while we don't want to respect the unintelligibility of astrologers.

So far I've been talking 'nationally' or 'globally' but I think a similar question holds locally. Do we want it to be the case that 'rationalists as a whole' think that meditators have gnosis and that this is respectable, or do we want 'rationalists as a whole' to think that any such respect is provisional or 'at individual discretion' or a mistake?

That is, when you say:

I don't consider this a problem, but I also recognize that within some parts of the rationalist community that is considered a problem (I model you as being one such person, Duncan).

I feel hopeful that we can settle whether or not this is a problem (or at least achieve much more mutual understanding and clarity).

So it is true that I can't provide adequate episteme of my claim, and maybe that's what you're reacting to.

This feels like the more important part ("if you don't have episteme, why do you believe it?") but I think there's a nearly-as-important other half, which is something like "presenting as having respected gnosis" vs. "presenting as having unrespected gnosis." If you're like "as a doctor, it is my considered medical opinion that everyone has spirituality", that's very different from "look, I can't justify this and so you should take it with a grain of salt, but I think everyone secretly has spirituality". I don't think you're at the first extreme, but I think Duncan is reacting to signals along that dimension.

Comment by vaniver on A Critique of Functional Decision Theory · 2019-09-13T21:53:55.041Z · score: 29 (12 votes) · LW · GW

(I work at MIRI, and edited the Cheating Death in Damascus paper, but this comment wasn't reviewed by anyone else at MIRI.)

This should be a constraint on any plausible decision theory.

But this principle prevents you from cooperating with yourself across empirical branches in the world!

Suppose a good predictor offers you a fair coin flip at favorable odds (say, 2 of their dollars to one of yours). If you called correctly, you can either forgive (no money moves) or demand; if you called incorrectly, you can either pay up or back out. The predictor only responds to your demand that they pay up if they predict that you would yourself pay up when you lose, but otherwise this interaction doesn't affect the rest of your life.

You call heads, the coin comes up tails. The Guaranteed Payoffs principle says:

You're certain that you're in a world where you will just lose a dollar if you pay up, and will lose no dollars if you don't pay up. It maximizes utility conditioned on this starting spot to not pay up.

The FDT perspective is to say:

The price of winning $2 in half of the worlds is losing $1 in the other half of the worlds. You want to be the sort of agent who can profit from these sorts of bets and/or you want to take this opportunity to transfer utility across worlds, because it's net profitable.

Note that the Bomb case is one in which we condition on the 1 in a trillion trillion failure case, and ignore the 999999999999999999999999 cases in which FDT saves $100. This is like pointing at people who got into a plane that crashed and saying "what morons, choosing to get on a plane that would crash!" instead of judging their actions from the state of uncertainty that they were in when they decided to get on the plane.

This is what Abram means when he says "with respect to the prior of the decision problem"; not that the FDT agent is expected to do well from any starting spot, but from the 'natural' one. (If the problem statement is as described and the FDT agent sees "you'll take the right box" and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.) It's not that the FDT agent wanders through the world unable to determine where it is even after obtaining evidence; it's that as the FDT agent navigates the world it considers its impact across all (connected) logical space instead of just immediately downstream of itself. Note that in my coin flip case, FDT is still trying to win the reward when the coin comes up heads even though in this case it came up tails, as opposed to saying "well, every time I see this problem the coin will come up tails, therefore I shouldn't participate in the bet."

[I do think this jump, from 'only consider things downstream of you' to 'consider everything', does need justification and I think the case hasn't been as compelling as I'd like it to be. In particular, the old name for this, 'updatelessness', threw me for a loop for a while because it sounded like the dumb "don't take input from your environment" instead of the conscious "consider what impact you're having on hypothetical versions of yourself".]

But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.

It seems to me like either you are convinced that the predictor is using features you can control (based on whether or not you decide to one-box) or features you can't control (like whether you're English or Scottish). If you think the latter, you two-box (because regardless of whether the predictor is rewarding you for being Scottish or not, you benefit from the $1000), and if you think the former you one-box (because you want to move the probability that the predictor fills the large box).

According to me, the simulation is just a realistic way to instantiate an actual dependence between the decision I'm making now and the prediction. (Like, when we have AIs we'll actually be able to put them in Newcomb-like scenarios!) If you want to posit a different, realistic version of that, then FDT is able to handle it (and the difficulty is all in moving from the English description of the problem to the subjunctive dependency graph).

Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing.

I don't think this is right; I think this is true only if the FDT agent thinks that S (a physically verifiable fact about the world, like the lesion) is logically downstream of its decision. In the simplest such graph I can construct, S is still logically upstream of the decision; are we making different graphs?

But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.

I don't buy this as an objection; decisions are often discontinuous. Suppose I'm considering staying at two different hotels, one with price A and the other with price B with B<A; then construct a series of changes to A that moves it imperceptibly, and at some point my decision switches abruptly from staying at hotel B to staying at hotel A. Whenever you pass multiple continuous quantities through an argmin or argmax, you can get sudden changes.

(Or, put a more analogous way, you can imagine insurance against an event with probability p, and we smoothly vary p, and at some point our action discontinuously jumps from not buying the insurance to buying the insurance.)

Comment by vaniver on Raemon's Scratchpad · 2019-09-01T19:26:09.879Z · score: 8 (3 votes) · LW · GW
It may take multiple years to find a group house where everyone gets along with everyone. I think it makes sense, earlier on, to focus on exploring (i.e. if you've just moved to the Bay, don't worry about getting a group house culture that is a perfect fit), but within 3 years I think it's achievable for most people to have found a group house that is good for friendship.

A thing that I have seen work well here is small houses nucleating out of large houses. If you're living in a place with >20 people for 6 months, probably you'll make a small group of friends that want similar things, and then you can found a smaller place with less risk. But of course this requires there being big houses that people can move into and out of, and that don't become the lower-common-denominator house that people can't form friendships in because they want to avoid the common spaces.

But of course the larger the house, the harder it is to get off the ground, and a place with deliberately high churn represents even more of a risk.

Comment by vaniver on How does one get invited to the alignment forum? · 2019-08-28T05:59:02.309Z · score: 5 (2 votes) · LW · GW
I just got approved for the Alignment Forum. I don't suppose you could explain why I was approved? I had others ask me about what gets someone approved.

Basically, in the runup for MSFP blog post day I reviewed a bunch of the old applications and approved three or four people, if I remember correctly. The expected pathways are something like "write good comment on LW and get them promoted to alignment forum" or "be someone whose name I recognize because of the AI alignment work (and think it's good)" or "come to an event where we think attendees should get membership."

Comment by vaniver on Vaniver's View on Factored Cognition · 2019-08-23T21:33:41.109Z · score: 3 (1 votes) · LW · GW

When I imagine them they are being initiated by some unit higher in the hierarchy. Basically, you could imagine having a tree of humans that is implementing a particular search process, or a different tree of humans implementing a search over search processes, with the second perhaps being more capable (because it can improve itself) but also perhaps leading to inner alignment problems.

Comment by vaniver on Vaniver's View on Factored Cognition · 2019-08-23T04:13:19.116Z · score: 6 (3 votes) · LW · GW
It wouldn't surprise me if I was similarly confused now, tho hopefully I am less so, and you shouldn't take this post as me speaking for Paul.

This post was improved some by a discussion with Evan which crystallized some points as 'clear disagreements' instead of me being confused, but I think there are more points to crystallize further in this way. It was posted tonight in the state it's in as part of MSFP 2019's blog post day, but might get edited more tomorrow or perhaps will get further elaborated in the comments section.

Comment by vaniver on Thoughts from a Two Boxer · 2019-08-23T01:37:09.710Z · score: 9 (5 votes) · LW · GW
One of my biggest open questions in decision theory is where this line between fair and unfair problems should lie.

I think the current piece that points at this question most directly is Success-First Decision Theories by Preston Greene.

At this point I am not convinced any problem where agents in the environment have access to our decision theory's source code or copies of our agent are fair problems. But my impression from hearing and reading what people talk about is that this is a heretical position.

It seems somewhat likely to me that agents will be reasoning about each other using access to source code fairly soon (if just human operators evaluating whether or not to run intelligent programs, or what inputs to give to those programs). So then the question is something like: "what's the point of declaring a problem unfair?", to which the main answer seems to be "to spend limited no free lunch points." If I perform poorly on worlds that don't exist in order to perform better on worlds that do exist, that's a profitable trade.

Which leads to this:

I disagree with this view and see Newcomb's problem as punishing rational agents.
My big complaint with mind reading is that there just isn't any mind reading.

One thing that seems important (for decision theories implemented by humans or embedded agents, as distinct from decision theories implemented by Cartesian agents) is whether or not the decision theory is robust to ignorance / black swans. That is, if you bake into your view of the world that mind reading is impossible, then you can be durably exploited by any actual mind reading (whereas having some sort of ontological update process or low probability on bizarre occurrences allows you to only be exploited a finite number of times).

But note the connection to the earlier bit--if something is actually impossible, then it feels costless to give up on it in order to perform better in the other worlds. (My personal resolution to counterfactual mugging, for example, seems to rest on an underlying belief that it's free to write off logically inconsistent worlds, in a way that it's not free to write off factually inconsistent worlds that could have been factually consistent / are factually consistent in a different part of the multiverse.)