Comment by danielfilan on The AI Timelines Scam · 2019-07-12T07:07:53.432Z · score: 16 (7 votes) · LW · GW
  • Doesn't engage with the post's arguments.
  • I think that it's wrong to assume that the prior on 'short' vs 'long' timelines should be 50/50.
  • I think that it's wrong to just rely on a prior, when it seems like one could obtain relevant evidence.
Comment by danielfilan on DanielFilan's Shortform Feed · 2019-07-04T22:44:38.428Z · score: 29 (6 votes) · LW · GW

The Indian grammarian Pāṇini wanted to exactly specify what Sanskrit grammar was in the shortest possible length. As a result, he did some crazy stuff:

Pāṇini's theory of morphological analysis was more advanced than any equivalent Western theory before the 20th century. His treatise is generative and descriptive, uses metalanguage and meta-rules, and has been compared to the Turing machine wherein the logical structure of any computing device has been reduced to its essentials using an idealized mathematical model.

There are two surprising facts about this:

  1. His grammar was written in the 4th century BC.
  2. People then failed to build on this machinery to do things like formalise the foundations of mathematics, formalise a bunch of linguistics, or even do the same thing for languages other than Sanskrit, in a way that is preserved in the historical record.

I've been obsessing about this for the last few days.

Comment by danielfilan on steven0461's Shortform Feed · 2019-06-30T18:01:48.080Z · score: 4 (2 votes) · LW · GW

Maybe Good Judgement Open? I don't know how they actually get their probabilities though.

Comment by danielfilan on Is there a guide to 'Problems that are too fast to Google'? · 2019-06-18T07:32:04.565Z · score: 4 (3 votes) · LW · GW

First aid seems very close to this category, consisting of immediate assistance to an injured person. The major differences are that (a) it's specific to physical injuries and (b) it involves things one person can do to help another, rather than things one should do to help oneself.

I've taken first aid training in Berkeley, California, and the guide to CPR was helpful, although the rest seemed to be mostly about meeting legal requirements and not that effective in actually teaching stuff (as evidenced by me not remembering it).

Comment by danielfilan on Is there a guide to 'Problems that are too fast to Google'? · 2019-06-18T07:26:52.043Z · score: 2 (1 votes) · LW · GW

Judo also recommends slapping the ground - see e.g. this link

Comment by danielfilan on Conditions for Mesa-Optimization · 2019-06-05T20:49:57.630Z · score: 7 (4 votes) · LW · GW

To see this, we can think of optimization power as being measured in terms of the number of times the optimizer is able to divide the search space in half—that is, the number of bits of information provided.

This is pretty confusing for me: If I'm doing gradient descent, how many times am I halving the entire search space? (although I appreciate that it's hard to come up with a better measure of optimisation)

Comment by danielfilan on Conditions for Mesa-Optimization · 2019-06-05T20:47:47.566Z · score: 5 (3 votes) · LW · GW

AFAICT, algorithmic range isn't the same thing as model capacity: I think that tabular learners have low algorithmic range, as the terms are used in this post, but high model capacity.

Comment by danielfilan on Risks from Learned Optimization: Introduction · 2019-05-31T02:46:15.118Z · score: 17 (9 votes) · LW · GW

Another example of trained optimisers that is imo worth checking out is Value Iteration Networks.

Comment by danielfilan on Totalitarian ethical systems · 2019-05-14T17:59:21.723Z · score: 4 (2 votes) · LW · GW

I guess I'd first like to disagree with the implication that using a single metric implies collapsing everything into a single metric, without getting curious about details and causal chains. The latter seems bad, for the reasons that you've mentioned, but I think there are reasons to like the former. Those reasons:

  • Many comparisons have a large number of different features. Choosing a single metric that's a function of only some features can make the comparison simpler by stopping you from considering features that you consider irrelevant, and inducing you to focus on features that are important for your decision (e.g. "gardening looks strictly better than charter cities because it makes me more productive, and that's the important thing in my metric - can I check if that's actually true, or quantify that?").
  • Many comparisons have a large number of varying features. If you think that by default you have biases or, more generally, unendorsed subroutines that cause you to focus on features you shouldn't, it can be useful to think about them when constructing a metric, and then using the metric in a way that 'crowds out' relevant biases (e.g. you might tie yourself to using QALYs if you're worried that by default you'll tend to favour interventions that help people of your own ethnicity more than you would consciously endorse). See Hanson's recent discussion of simple rules vs the use of discretion.
  • By having your metric be a function of a comparatively small number of features, you give yourself the ability to search the space of things you could possibly do by how those things stack up against those features, focussing the options you consider on things that you're more likely to endorse (e.g. "hmm, if I wanted to maximise QALYs, what jobs would I want to take that I'm not currently considering?" or "hmm, if I wanted to maximise QALYs, what systems in the world would I be interested in affecting, and what instrumental goals would I want to pursue?"). I don't see how to do this without, if not a single metric, then a small number of metrics.
  • Metrics can crystallise tradeoffs. If I'm regularly thinking about different interventions that affect the lives of different farmed animals, then after making several decisions, it's probably computationally easier for me to come up with a rule for how I tend to trade off cow effects vs sheep effects, and/or freedom effects vs pain reduction effects, then to make that tradeoff every time independently.
  • Metrics help with legibility. This is less important in the case of an individual choosing career options to take, but suppose that I want to be GiveWell, and recommend charities I think are high-value, or I want to let other people who I don't know very well invest in my career. In that case, it's useful to have a legible metric that explains what decisions I'm making, so that other people can predict my future actions better, and so that they can clearly see reasons for why they should support me.
Comment by danielfilan on Totalitarian ethical systems · 2019-05-10T02:03:35.250Z · score: 2 (1 votes) · LW · GW

Profit is a helpful unifying decision metric, but it's not actually good to literally just maximize profits, this leads in the long run to destructive rent-seeking, regulatory capture, and trying to maximize negative externalities.

Agreed. That being said, it does seem like the frame in which it's important to evaluate global states of the business using the simple metric of profit is also right: like, maybe you also need strategic vision and ethics, but if you're not assessing expected future profits, it certainly seems to me that you're going to miss some things and go off the rails. [NB: I am more tied to the personal impact example than the business example, so I'd like to focus discussion in that thread, if it continues].

Comment by danielfilan on Tales From the American Medical System · 2019-05-10T01:54:16.962Z · score: 18 (8 votes) · LW · GW

Your friend has a deadly disease that requires regular doctor visits and prescriptions.

I think that this is a sketchy way to phrase this. Presumably, what a disease requires is a cure (or one of several cures). 'Doctor visits' and 'prescriptions' are one system society can have to assign cures to people, but there could also be other systems, like 'you get to walk to a store and buy insulin if you want some without needing anybody's seal of approval, and you can also see somebody to advise you on how much insulin to take'. Saying that the disease requires regular doctor visits and prescriptions seems to me to rhetorically imply that the costs associated with those are due to the disease, not due to the health care system, without doing the work of checking how well the system works (after all, if the system were about as well as we could manage, the costs really would be due to the disease).

Comment by danielfilan on Totalitarian ethical systems · 2019-05-08T03:38:52.377Z · score: 14 (4 votes) · LW · GW

Re: the section on coming up with simple metrics to evaluate global states, which I couldn't quickly figure out how to nicely excerpt:

I tentatively disagree with the claim that "Only if you go all the way to the extreme of total central planning do you really need a single totalizing metric", at least the way I think 'totalizing' is being applied. As a human in the world, I can see a few cool things I could potentially do: I could continue my PhD and try to do some important research in AI alignment, I could try to get involved with projects to build charter cities, I could try to advocate for my city to adopt policies that I think are good for local flourishing, I could try to give people info that makes it easier for them to eat a vegan diet, or I could make a nice garden. Since I can't do all of these, I need some way to pick between them. One important way is how productive I would be at each activity (as measured by to what extent I can get the activity done), but I think that for many of these my productivity is about in the same ballpark. To compare between these different activites, it seems like it's really useful to have a single metric on the future history of the world that can trade off the different bits of the world that these activities affect. Similarly, if I'm running a business, it's hard to understand how I could make do without the single metric of profit to guide my decisions.

Comment by danielfilan on Coordination Surveys: why we should survey to organize responsibilities, not just predictions · 2019-05-08T00:09:56.652Z · score: 11 (6 votes) · LW · GW

Causing people to change their behaviour to your favourite behaviour by means other than adding safe input to people's rational deliberation processes seems questionable. Causing people to learn more about the world and give them the opportunity to change their behaviour if they feel it's warranted by the state of the world seems good. This post seems like it's proposing the latter to me - if you disagree, could you point out why?

Comment by danielfilan on DanielFilan's Shortform Feed · 2019-05-02T19:58:26.356Z · score: 13 (5 votes) · LW · GW

I often see (and sometimes take part in) discussion of Facebook here. I'm not sure whether when I partake in these discussions I should disclaim that my income is largely due to Good Ventures, whose money largely comes from Facebook investments. Nobody else does this, so shrug.

Comment by danielfilan on Habryka's Shortform Feed · 2019-04-30T21:40:06.551Z · score: 2 (1 votes) · LW · GW

In my case, it sure feels like I check my karma often because I often want to know what my karma is, but maybe others differ.

Comment by danielfilan on Habryka's Shortform Feed · 2019-04-30T18:36:08.424Z · score: 5 (3 votes) · LW · GW

I mean, you can definitely check your karma multiple times a day to see where the last two sig digits are at, which is something I sometimes do.

Comment by danielfilan on DanielFilan's Shortform Feed · 2019-04-30T02:35:28.567Z · score: 5 (3 votes) · LW · GW

Often big things are made of smaller things: e.g., the economy is made of humans and machines interacting, and neural networks are made of linear functions and ReLUs composed together. Say that a property P survives composition if knowing that P holds for all the smaller things tells you that P holds for the bigger thing. It's nice if properties survive composition, because it's easier to figure out if they hold for small things than to directly tackle the problem of whether they hold for a big thing. Boundedness doesn't survive composition: people and machines are bounded, but the economy isn't. Interpretability doesn't survive composition: linear functions and ReLUs are interpretable, but neural networks aren't.

Comment by danielfilan on DanielFilan's Shortform Feed · 2019-04-30T00:23:22.069Z · score: 19 (4 votes) · LW · GW

Shower thought[*]: the notion of a task being bounded doesn't survive composition. Specifically, say a task is bounded if the agent doing it is only using bounded resources and only optimising a small bit of the world to a limited extent. The task of 'be a human in the enterprise of doing research' is bounded, but the enterprise of research in general is not bounded. Similarly, being a human with a job vs the entire human economy. I imagine keeping this in mind would be useful when thinking about CAIS.

Similarly, the notion of a function being interpretable doesn't survive composition. Linear functions are interpretable (citation: the field of linear algebra), as is the ReLU function, but the consensus is that neural networks are not, or at least not in the same way.

I basically wish that the concepts that I used survived composition.

[*] Actually I had this on a stroll.

Comment by danielfilan on When is rationality useful? · 2019-04-27T01:26:53.281Z · score: 11 (3 votes) · LW · GW

All else equal, do you think a rationalist mathematician will become more successful in their field than a non-rationalist mathematician?

This post by Jacob Steinhardt seems relevant: it's a sequence of models of research, and describes what good research strategies look like in them. He says, of the final model:

Before implementing this approach, I made little research progress for over a year; afterwards, I completed one project every four months on average. Other changes also contributed, but I expect the ideas here to at least double your productivity if you aren't already employing a similar process.

Comment by danielfilan on Speaking for myself (re: how the LW2.0 team communicates) · 2019-04-26T00:29:15.509Z · score: 10 (2 votes) · LW · GW

FWIW, we spend loads of time on belief-communication.

To clarify, I didn't think otherwise (and also, right now, I'm not confident that you thought I did think otherwise).

We still converge on a course of action.

Sure - I now think that my comment overrated how much convergence was necessary for decision-making.

Comment by danielfilan on Speaking for myself (re: how the LW2.0 team communicates) · 2019-04-26T00:27:00.228Z · score: 2 (1 votes) · LW · GW

I get the sense that you don't understand me here.

In a system of mutual understanding, I have a model of your model, and you have a model of my model, but nevertheless any prediction about the world is a result of one of our two models (which might have converged, or at the very least include parts of one another).

We can choose to live in a world where the model in my head is the same as the model in your head, and that this is common knowledge. In this world, you could think about a prediction being made by either the model in my head or the model in your head, but it makes more sense to think about it as being made by our model, the one that results from all the information we both have (just like the integer 3 in my head is the same number as the integer 3 in your head, not two numbers that happen to coincide). If I believed that this was possible, I wouldn't talk about how official group models are going to be impoverished 'common denominator' models, or conclude a paragraph with a sentence like "Organizations don’t have models, people do."

Comment by danielfilan on Speaking for myself (re: how the LW2.0 team communicates) · 2019-04-25T23:05:08.753Z · score: 2 (1 votes) · LW · GW

[E]ven if there are collective decisions, there are no collective models. Not real models.

When the team agrees to do something, it is only because enough of the individual team members individually have models which indicate it is the right thing to do.

There's something kind of worrying/sad about this. One would hope that with a small enough group, you'd be able to have discussion and Aumann magicconvergence lead to common models (and perhaps values?) being held by everybody. In this world, the process of making decisions is about gathering information from team members about the relevant considerations, and then a consensus emerges about what the right thing to do is, driven by consensus beliefs about the likely outcomes. When you can't do this, you end up in voting theory land, where even if each individual is rational, methods to aggregate group preferences about plans can lead to self-contradictory results.

I don't particularly have advice for you here - presumably you've already thought about the cost-benefit analysis of spending marginal time on belief communication - but the downside here felt worth pointing out.

Comment by danielfilan on The Principle of Predicted Improvement · 2019-04-25T18:36:01.140Z · score: 8 (5 votes) · LW · GW

Just to add an additional voice here, I would view that as incorrect in this context, instead referring to the thing that the CEE is saying. The way I'd try to clarify this would be to put the variables varying in the expectation in subscripts after the , so the CEE equation would look like , and the PPI inequality would be .

Comment by danielfilan on The Principle of Predicted Improvement · 2019-04-25T05:28:36.913Z · score: 8 (5 votes) · LW · GW

I should note that when I first saw the PPI inequality, I also didn't get what it was saying, just because I had very low prior probability mass on it saying the thing it actually says. (I can't quite pin down what generalisation or principle led to this situation, but there you go.)

Comment by danielfilan on The Principle of Predicted Improvement · 2019-04-25T05:21:05.995Z · score: 9 (6 votes) · LW · GW

I have a very basic question about notation -- what tells me that H in the equation refers to the true hypothesis?

H stands for hypothesis. We're taking expectations over our distribution over hypotheses: that is, expectations over which hypothesis is true.

Put another way, I don't really understand why that equation has a different interpretation than the conservation-of-expected-evidence equation: E[P(H=hi|D)]=P(H=hi).

In the PPI inequality, the expectations are being taken over H and D jointly, in the CEE equation, the expectation is just being taken over D.

Comment by danielfilan on DanielFilan's Shortform Feed · 2019-04-25T05:18:51.938Z · score: 13 (3 votes) · LW · GW

One result that's related to Aumann's Agreement Theorem is that if you and I alternate saying our posterior probabilities of some event, we converge on the same probability if we have common priors. You might therefore wonder why we ever do anything else. The answer is that describing evidence is strictly more informative than stating one's posterior. For instance, imagine that we've both secretly flipped coins, and want to know whether both coins landed on the same side. If we just state our posteriors, we'll immediately converge to 50%, without actually learning the answer, which we could have learned pretty trivially by just saying how our coins landed. This is related to the original proof of the Aumann agreement theorem in a way that I can't describe shortly.

Comment by danielfilan on When is rationality useful? · 2019-04-25T04:12:59.699Z · score: 12 (3 votes) · LW · GW

We can model success as a combination of doing useful things and avoiding making mistakes. As a particular example, we can model intellectual success as a combination of coming up with good ideas and avoiding bad ideas. I claim that rationality helps us avoid mistakes and bad ideas, but doesn’t help much in generating good ideas and useful work.

I think this might be sort of right, but note that since plans are hierarchical, rationality can help you avoid mistakes (e.g. failing to spend 5 minutes thinking about good ways to do something important, or assuming that your first impression of a field is right) that would have prevented you from generating good ideas.

Comment by danielfilan on Counterfactuals about Social Media · 2019-04-23T04:31:40.189Z · score: 2 (1 votes) · LW · GW

Eventually my family made a messenger group including my parents, sister, and maternal grandparents, which works OK for this.

Comment by danielfilan on Counterfactuals about Social Media · 2019-04-23T04:28:38.370Z · score: 9 (5 votes) · LW · GW

Things I like about facebook, beyond what you and elizabeth have mentioned:

  • Makes my day more fun by showing me amusing things.
  • Shows me interesting discussions that I can join in a low-stakes way, and yet the discussions still go places.
  • Introduces me to people I don't know via those interesting discussions, making it a bit easier to get to know them in real life.
  • Helps people around me run events that I'm invited to.
  • Helps me get answers to quick questions I have.
  • Shows me aspects of friends' lives that I wouldn't otherwise be aware of.
  • Lets me poll my friends about topics of interest.
  • Created a centralised messaging system that ~everybody uses.
Comment by danielfilan on Best reasons for pessimism about impact of impact measures? · 2019-04-23T02:04:37.688Z · score: 6 (3 votes) · LW · GW

I also might've expected some people to wonder, given their state interpretation, how come I'm not worried about stuff I mentioned in the whitelisting post anymore

I don't read everything that you write, and when I do read things there seems to be some amount of dropout that occurs resulting in me missing certain clauses (not just in long posts by you, even while proofreading the introduction section of a friend's paper draft!) that I don't notice until quizzed in detail -- I suspect this is partially due to me applying lossy compression that preserves my first guess about the gist of a paragraph, and maybe partially due to literal saccades while reading. The solution is repetition and redundancy: for example, I assume that you tried to do that in your quotes after the phrase "Let's go through some of my past comments about this", but only the quote

[R]elative reachability requires solution of several difficult ontological problems which may not have anything close to a simple core, including both a sensible world state representation and a perfect distance metric

implies to me that we're moving away from a state-based way of thinking, and it doesn't directly say anything about AUP.

Comment by danielfilan on Best reasons for pessimism about impact of impact measures? · 2019-04-23T02:01:21.010Z · score: 6 (3 votes) · LW · GW

This isn't a full response, but it seems to me that Vika is largely talking about problems she percieves with impact measures in general, as defined by "measures of how much impact things have on the world", and is thinking of AUP as an element of this class (as would I, had I not read this comment). Reasons to think this include:

  • A perception of your research as primarily being the development of AUP, and of this post as being research for that development and exposition.
  • The introduction of AUP being in a post titled "Towards a New Impact Measure".

If AUP is not in fact about restricting an agent's impact on the world (or, in other words, on the state of the world), then I would describe it as something other than an "impact measure", since that term is primarily used by people using the way of thinking you denounce (and I believe was invented that way: it seems to have morphed from 'side effects', which strongly suggests effects on parts of the world, according to my quick looking-over of the relevant section of Concrete Problems in AI Safety). Perhaps "optimisation regularisation technique" would be better, although I don't presume to understand your way of thinking about it.

Comment by danielfilan on Value Learning is only Asymptotically Safe · 2019-04-16T07:01:43.130Z · score: 4 (2 votes) · LW · GW

I sort of object to titling this post "Value Learning is only Asymptotically Safe" when the actual point you make is that we don't yet have concrete optimality results for value learning other than asymptotic safety.

We should definitely be writing down sets assumptions from which we can derive formal results about the expected behavior of an agent, but is there anything to aim for that is stronger than asymptotic safety?

In the case of value learning, given the generous assumption that "we somehow figured out how to design an agent which understood what constituted observational evidence of humanity’s reflectively-endorsed utility function", it seems like you should be able to get a PAC-type bound, where by time , the agent is only -suboptimal with probability , where is increasing in but decreasing in -- see results on PAC bounds for Bayesian learning, which I haven't actually looked at. This gives you bounds stronger than asymptotic optimality for value leraning. Sadly, if you want your agent to actually behave well in general environments, you probably won't get results better than asymptotic optimality, but if you're happy to restrict yourself to MDPs, you probably can.

Comment by danielfilan on Best reasons for pessimism about impact of impact measures? · 2019-04-11T18:20:34.307Z · score: 9 (4 votes) · LW · GW

Unlike humans, AI systems would be able to cause butterfly effects on purpose, and could channel their impact through butterfly effects if they are not penalized.

Indeed - a point I think is illustrated by the Chaotic Hurricanes test case. I'm probably most excited about methods that would use transparency techniques to determine when a system is deliberately optimising for a part of the world (e.g. the members of the long-term future population) that we don't want it to care about, but this has a major drawback of perhaps requiring multiple philosophical advances into the meaning of reference in cognition and a greater understanding of what optimisation is.

Comment by danielfilan on Best reasons for pessimism about impact of impact measures? · 2019-04-11T18:10:38.046Z · score: 2 (1 votes) · LW · GW

No.

Comment by danielfilan on Best reasons for pessimism about impact of impact measures? · 2019-04-11T18:10:16.742Z · score: 2 (1 votes) · LW · GW

I prefer the phrase 'impact regularisation', but indeed that was a slip of the mind.

Comment by danielfilan on Best reasons for pessimism about impact of impact measures? · 2019-04-10T23:07:26.475Z · score: 4 (2 votes) · LW · GW

I think that under the worldview of this concern, the distribution of reward functions effectively defines a representation that, if too different from the one humans care about, will either mean that no realistic impact is possible in the real world or be ineffective at penalising unwanted negative impacts.

Comment by danielfilan on Best reasons for pessimism about impact of impact measures? · 2019-04-10T20:45:16.851Z · score: 19 (9 votes) · LW · GW

My concern is similar to Wei Dai's: it seems to me that at a fundamental physical level, any plan involving turning on a computer that does important stuff will make pretty big changes to the world's trajectory in phase space. Heat dissipation will cause atmospheric particles to change their location and momentum, future weather patterns will be different, people will do things at different times (e.g. because they're waiting for a computer program to run, or because the computer is designed to change the flow of traffic through a city), meet different people, and have different children. As a result, it seems hard for me to understand how impact measures could work in the real world without a choice of representation very close to the representation humans use to determine the value of different worlds. I suspect that this will need input from humans similar to what value learning approaches might need, and that once it's done one could just do value learning and dispense with the need for impact measures. That being said, this is more of an impression than a belief - I can't quite convince myself that no good method of impact regularisation exists, and some other competent people seem to disagre ewith me.

Comment by danielfilan on DanielFilan's Shortform Feed · 2019-03-25T23:41:11.929Z · score: 7 (4 votes) · LW · GW

I made this post with the intent to write a comment, but the process of writing the comment out made it less persuasive to me. The planning fallacy?

DanielFilan's Shortform Feed

2019-03-25T23:32:38.314Z · score: 19 (5 votes)
Comment by danielfilan on The Amish, and Strategic Norms around Technology · 2019-03-25T20:45:23.927Z · score: 6 (4 votes) · LW · GW

While [at MAPLE], there were limits on technology that meant, after 9pm, you basically had two choices: read, or go to bed. The choices were strongly reinforced by the social and physical environment. And this made it much easier to make choices they endorsed.

Important context for this is that morning chanting started at 4:40 am, so going to sleep at 9 pm was a more endorsable choice than it might appear.

Comment by danielfilan on Privacy · 2019-03-17T01:23:48.202Z · score: 7 (4 votes) · LW · GW

Do you think that thoughts are too incentivised or not incentivised enough on the margin, for the purpose of epistemically sound thinking? If they're too incentivised, have you considered dampening LWs karma system? If they're not incentivised enough, what makes you believe that legalising blackmail will worsen the epistemic quality of thoughts?

Comment by danielfilan on Blackmailers are privateers in the war on hypocrisy · 2019-03-15T20:54:07.042Z · score: 4 (2 votes) · LW · GW

I think this is the OB post Benquo is quoting from, but accidentally forgot to include the link.

Comment by danielfilan on Blackmailers are privateers in the war on hypocrisy · 2019-03-14T19:08:09.060Z · score: 7 (6 votes) · LW · GW

blackmail-in-practice is often about leveraging the norm enforcement of a different community than the target's, exploiting differences in norms between groups

I'm confused about how you would know this - it seems that by nature, most blackmail-in-practice is going to be unobserved by the wider public, leaving only failed blackmail attempts (which I expect to be systematically different than average since they failed) or your own likely-unrepresentative experiences (if you have any at all).

Comment by danielfilan on Understanding information cascades · 2019-03-14T07:19:31.783Z · score: 15 (4 votes) · LW · GW

This paper looks at the dynamics of information flows in social networks using multi-agent reinforcement learning. I haven't read it, but am impressed by the work of the second author. Abstract:

We model the spread of news as a social learning game on a network. Agents can either endorse or oppose a claim made in a piece of news, which itself may be either true or false. Agents base their decision on a private signal and their neighbors' past actions. Given these inputs, agents follow strategies derived via multi-agent deep reinforcement learning and receive utility from acting in accordance with the veracity of claims. Our framework yields strategies with agent utility close to a theoretical, Bayes optimal benchmark, while remaining flexible to model re-specification. Optimized strategies allow agents to correctly identify most false claims, when all agents receive unbiased private signals. However, an adversary's attempt to spread fake news by targeting a subset of agents with a biased private signal can be successful. Even more so when the adversary has information about agents' network position or private signal. When agents are aware of the presence of an adversary they re-optimize their strategies in the training stage and the adversary's attack is less effective. Hence, exposing agents to the possibility of fake news can be an effective way to curtail the spread of fake news in social networks. Our results also highlight that information about the users' private beliefs and their social network structure can be extremely valuable to adversaries and should be well protected.

Comment by danielfilan on Formalising continuous info cascades? [Info-cascade series] · 2019-03-14T07:11:56.897Z · score: 10 (3 votes) · LW · GW

A relevant result is Aumann's agreement theorem, and offshoots where two Bayesians repeating their probability judgements back and forth will converge on a common belief. Although note that that belief isn't always the one they would have in the case that they both knew all their observations - supposing we both privately flip coins, and state our probabilities that we got the same result, we'll spend all day saying 50% without actually learning the answer - nevertheless you shouldn't expect probabilities to badly asymptote in expectation.

This makes me think that you'll want to think about bounded-rational models where people can only recurse 3 times, or something. [ETA: or models where some participants in the discourse are adversarial, as in this paper].

Comment by danielfilan on Formalising continuous info cascades? [Info-cascade series] · 2019-03-14T07:04:05.480Z · score: 4 (2 votes) · LW · GW

I think that the rewrite mentioned was actually made, and the post as stands is right.

(Although in this case it's weird to call it an information cascade - in the situation described in the post, people don't have any reason to think that a +50 karma post is any better than a +10 karma post, so information isn't really cascading, just karma).

Comment by danielfilan on A defense on QI · 2019-03-08T05:40:59.137Z · score: 15 (6 votes) · LW · GW

For what it's worth, I'm familiar with the philosophy surrounding the many worlds interpretation (MWI, and while I can't vouch for all the argumentation in that paper, I think that (a) quantum immortality is not a consequence of the MWI, and (b) this paper offers a valid argument for (a).

On a more meta point, I think the strategy of "hear something I disagree with -> look for a debunking" isn't likely to lead you to truth - if you were wrong, how would this strategy help you find out? You could carefully check both the argument you disagree with and the debunking, seeing which is flawed or finding a valid synthesis of both, but from the tone of your post I imagine you finding something that counts as a 'debunking' and not pursuing the matter further. I think it would be more wise to think carefully about the claims in the article, look for counterarguments, think carefully about those, and come to your own conclusions (where perhaps the 'thinking' involves discussing the issues with a friend, or on LessWrong or a similar forum). If you can't make heads or tails of the issue, but think you can identify experts who can, then one other option would be to defer to expert consensus. Sadly, in this case, I can't find a poll of experts, but looking at the Wikipedia page on Quantum Suicide and Immortality only quotes two experts (Max Tegmark and David Deutsch), neither of whom agree that quantum immortality works. As such, I suspect that belief in quantum immortality is very uncommon among experts, since otherwise I'd expect to see an expert quoted in the Wikipedia article supporting the view that quantum immortality is real.

Comment by danielfilan on Karma-Change Notifications · 2019-03-05T19:50:52.048Z · score: 20 (7 votes) · LW · GW

I don't like the idea that LW will tell me my daily karma change but only if it's good news.

Comment by danielfilan on How much funding and researchers were in AI, and AI Safety, in 2018? · 2019-03-04T05:22:11.976Z · score: 10 (2 votes) · LW · GW

I notice that in the case of AI safety, it's probably possible to just literally count the researchers by hand.

I think this is probably not true for the average LW reader, or even the average person who's kind of interested in AI alignment, since many orgs are sort of opaque about how many people work there and what team people are on. For example my guess is that most people don't know how many interns CHAI takes, or how many new PhD students we get in a given year, and similarly, I'm not even confident that I could name everybody in OpenAI's safety team without someone to catch my errors.

I assume for "broader work on AI" it'd be necessary to either consult some kind of research that already had them counted, since there's just way too much stuff going on.

Seems correct to me.

Comment by danielfilan on How much funding and researchers were in AI, and AI Safety, in 2018? · 2019-03-03T23:52:27.640Z · score: 9 (5 votes) · LW · GW

By my quick mental count, CHAI's Berkeley branch had something like the equivalent of 8 to 11 researchers focussing on AI alignment in 2018. Kind of tricky to count because we had new PhD students coming in in August, as well as some interns over the summer (some of whom stayed on for longer periods).

Comment by danielfilan on LW Update 2019-01-03 – New All-Posts Page, Author hover-previews and new post-item · 2019-03-02T08:27:20.947Z · score: 3 (2 votes) · LW · GW

I get the same thing in firefox.

Robin Hanson on Lumpiness of AI Services

2019-02-17T23:08:36.165Z · score: 16 (6 votes)

Test Cases for Impact Regularisation Methods

2019-02-06T21:50:00.760Z · score: 62 (18 votes)

Does freeze-dried mussel powder have good stuff that vegan diets don't?

2019-01-12T03:39:19.047Z · score: 17 (4 votes)

In what ways are holidays good?

2018-12-28T00:42:06.849Z · score: 22 (6 votes)

Kelly bettors

2018-11-13T00:40:01.074Z · score: 23 (7 votes)

Bottle Caps Aren't Optimisers

2018-08-31T18:30:01.108Z · score: 53 (21 votes)

Mechanistic Transparency for Machine Learning

2018-07-11T00:34:46.846Z · score: 50 (17 votes)

Research internship position at CHAI

2018-01-16T06:25:49.922Z · score: 25 (8 votes)

Insights from 'The Strategy of Conflict'

2018-01-04T05:05:43.091Z · score: 73 (27 votes)

Meetup : Canberra: Guilt

2015-07-27T09:39:18.923Z · score: 1 (2 votes)

Meetup : Canberra: The Efficient Market Hypothesis

2015-07-13T04:01:59.618Z · score: 1 (2 votes)

Meetup : Canberra: More Zendo!

2015-05-27T13:13:50.539Z · score: 1 (2 votes)

Meetup : Canberra: Deep Learning

2015-05-17T21:34:09.597Z · score: 1 (2 votes)

Meetup : Canberra: Putting Induction Into Practice

2015-04-28T14:40:55.876Z · score: 1 (2 votes)

Meetup : Canberra: Intro to Solomonoff induction

2015-04-19T10:58:17.933Z · score: 1 (2 votes)

Meetup : Canberra: A Sequence Post You Disagreed With + Discussion

2015-04-06T10:38:21.824Z · score: 1 (2 votes)

Meetup : Canberra HPMOR Wrap Party!

2015-03-08T22:56:53.578Z · score: 1 (2 votes)

Meetup : Canberra: Technology to help achieve goals

2015-02-17T09:37:41.334Z · score: 1 (2 votes)

Meetup : Canberra Less Wrong Meet Up - Favourite Sequence Post + Discussion

2015-02-05T05:49:29.620Z · score: 1 (2 votes)

Meetup : Canberra: the Hedonic Treadmill

2015-01-15T04:02:44.807Z · score: 1 (2 votes)

Meetup : Canberra: End of year party

2014-12-03T11:49:07.022Z · score: 1 (2 votes)

Meetup : Canberra: Liar's Dice!

2014-11-13T12:36:06.912Z · score: 1 (2 votes)

Meetup : Canberra: Econ 101 and its Discontents

2014-10-29T12:11:42.638Z · score: 1 (2 votes)

Meetup : Canberra: Would I Lie To You?

2014-10-15T13:44:23.453Z · score: 1 (2 votes)

Meetup : Canberra: Contrarianism

2014-10-02T11:53:37.350Z · score: 1 (2 votes)

Meetup : Canberra: More rationalist fun and games!

2014-09-15T01:47:58.425Z · score: 1 (2 votes)

Meetup : Canberra: Akrasia-busters!

2014-08-27T02:47:14.264Z · score: 1 (2 votes)

Meetup : Canberra: Cooking for LessWrongers

2014-08-13T14:12:54.548Z · score: 1 (2 votes)

Meetup : Canberra: Effective Altruism

2014-08-01T03:39:53.433Z · score: 1 (2 votes)

Meetup : Canberra: Intro to Anthropic Reasoning

2014-07-16T13:10:40.109Z · score: 1 (2 votes)

Meetup : Canberra: Paranoid Debating

2014-07-01T09:52:26.939Z · score: 1 (2 votes)

Meetup : Canberra: Many Worlds + Paranoid Debating

2014-06-17T13:44:22.361Z · score: 1 (2 votes)

Meetup : Canberra: Decision Theory

2014-05-26T14:44:31.621Z · score: 1 (2 votes)

[LINK] Scott Aaronson on Integrated Information Theory

2014-05-22T08:40:40.065Z · score: 22 (23 votes)

Meetup : Canberra: Rationalist Fun and Games!

2014-05-01T12:44:58.481Z · score: 0 (3 votes)

Meetup : Canberra: Life Hacks Part 2

2014-04-14T01:11:27.419Z · score: 0 (1 votes)

Meetup : Canberra Meetup: Life hacks part 1

2014-03-31T07:28:32.358Z · score: 0 (1 votes)

Meetup : Canberra: Meta-meetup + meditation

2014-03-07T01:04:58.151Z · score: 3 (4 votes)

Meetup : Second Canberra Meetup - Paranoid Debating

2014-02-19T04:00:42.751Z · score: 1 (2 votes)