Existing work on creating terminology & names? 2020-01-31T12:16:32.650Z · score: 10 (3 votes)
Terms & literature for purposely lossy communication 2020-01-22T10:35:47.162Z · score: 12 (5 votes)
Predictably Predictable Futures Talk: Using Expected Loss & Prediction Innovation for Long Term Benefits 2020-01-08T12:51:01.339Z · score: 13 (3 votes)
[Part 1] Amplifying generalist research via forecasting – Models of impact and challenges 2019-12-19T15:50:33.412Z · score: 53 (13 votes)
[Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration 2019-12-19T15:49:45.901Z · score: 48 (12 votes)
Introducing A New Open-Source Prediction Registry 2019-10-16T14:23:47.229Z · score: 91 (28 votes)
ozziegooen's Shortform 2019-08-31T23:03:24.809Z · score: 17 (6 votes)
Conversation on forecasting with Vaniver and Ozzie Gooen 2019-07-30T11:16:58.633Z · score: 43 (11 votes)
Ideas for Next Generation Prediction Technologies 2019-02-21T11:38:57.798Z · score: 16 (14 votes)
Predictive Reasoning Systems 2019-02-20T19:44:45.778Z · score: 26 (11 votes)
Impact Prizes as an alternative to Certificates of Impact 2019-02-20T00:46:25.912Z · score: 21 (3 votes)
Can We Place Trust in Post-AGI Forecasting Evaluations? 2019-02-17T19:20:41.446Z · score: 23 (9 votes)
The Prediction Pyramid: Why Fundamental Work is Needed for Prediction Work 2019-02-14T16:21:13.564Z · score: 44 (15 votes)
Short story: An AGI's Repugnant Physics Experiment 2019-02-14T14:46:30.651Z · score: 9 (7 votes)
Three Kinds of Research Documents: Clarification, Explanatory, Academic 2019-02-13T21:25:51.393Z · score: 23 (6 votes)
The RAIN Framework for Informational Effectiveness 2019-02-13T12:54:20.297Z · score: 40 (13 votes)
Overconfident talking down, humble or hostile talking up 2018-11-30T12:41:54.980Z · score: 45 (20 votes)
Stabilize-Reflect-Execute 2018-11-28T17:26:39.741Z · score: 32 (10 votes)
What if people simply forecasted your future choices? 2018-11-23T10:52:25.471Z · score: 19 (6 votes)
Current AI Safety Roles for Software Engineers 2018-11-09T20:57:16.159Z · score: 82 (31 votes)
Prediction-Augmented Evaluation Systems 2018-11-09T10:55:36.181Z · score: 44 (16 votes)
Critique my Model: The EV of AGI to Selfish Individuals 2018-04-08T20:04:16.559Z · score: 51 (14 votes)
Expected Error, or how wrong you expect to be 2016-12-24T22:49:02.344Z · score: 9 (9 votes)
Graphical Assumption Modeling 2015-01-03T20:22:21.432Z · score: 23 (18 votes)
Understanding Who You Really Are 2015-01-02T08:44:50.374Z · score: 9 (19 votes)
Why "Changing the World" is a Horrible Phrase 2014-12-25T06:04:48.902Z · score: 28 (40 votes)
Reference Frames for Expected Value 2014-03-16T19:22:39.976Z · score: 5 (23 votes)
Creating a Text Shorthand for Uncertainty 2013-10-19T16:46:12.051Z · score: 6 (11 votes)
Meetup : San Francisco: Effective Altruism 2013-06-23T21:48:34.365Z · score: 3 (4 votes)


Comment by ozziegooen on ozziegooen's Shortform · 2020-07-05T08:22:01.630Z · score: 2 (1 votes) · LW · GW

Thanks! I'll check it out.

Comment by ozziegooen on ozziegooen's Shortform · 2020-06-29T11:46:52.514Z · score: 15 (8 votes) · LW · GW

I was recently pointed to the Youtube channel Psychology in Seattle. I think it's one of my favorites in a while.

I'm personally more interested in workspace psychology than relationship psychology, but my impression is that they share a lot of similarities.

Emotional intelligence gets a bit of a bad rap due to the fuzzy nature, but I'm convinced it's one of the top few things for most people to get better at. I know lots of great researchers and engineers who repeat a bunch of repeated failure modes, and this causes severe organizational and personal problems as a result.

Emotional intelligence books and training typically seem quite poor to me. The alternative format here of "let's just show you dozens of hours of people interacting with each other, and point out all the fixes they could make" seems much better than most books or lectures I've seen.

This Youtube series does an interesting job at that. There's a whole bunch of "let's watch this reality TV show, then give our take on it." I'd be pretty excited about there being more things like this posted online, especially in other contexts.

Related, I think the potential of reality TV is fairly underrated in intellectual circles, but that's a different story.

Comment by ozziegooen on ozziegooen's Shortform · 2020-06-26T10:25:16.494Z · score: 4 (2 votes) · LW · GW

Fair point. I imagine when we are planning for where to aim things though, we can expect to get better at quantifying these things (over the next few hundred years), and also aim for strategies that would broadly work without assuming precarious externalities. 

Comment by ozziegooen on ozziegooen's Shortform · 2020-06-25T09:57:37.573Z · score: 8 (4 votes) · LW · GW

The 4th Estate heavily relies on externalities, and that's precarious.

There's a fair bit of discussion of how much of journalism has died with local newspapers, and separately how the proliferation of news past 3 channels has been harmful for discourse.

In both of these cases, the argument seems to be that a particular type of business transaction resulted in tremendous positive national externalities.

It seems to me very precarious to expect that society at large to only work because of a handful of accidental and temporary externalities.

In the longer term, I'm more optimistic about setups where people pay for the ultimate value, instead of this being an externality. For instance, instead of buying newspapers, which helps in small part to pay for good journalism, people donate to nonprofits that directly optimize the government reform process.

If you think about it, the process of:

  • People buy newspapers, a fraction of which are interested in causing change.
  • Great journalists come across things around government or society that should be changed, and write about them.
  • A bunch of people occasionally get really upset about some of the findings, and report this to authorities or vote differently. ...

is all really inefficient and roundabout compared to what's possible. There's very little division of expertise among the public for instance, there's no coordination where readers realize that there are 20 things that deserve equal attention, so split into 20 subgroups. This is very real work the readers aren't getting compensated for, so they'll do whatever they personally care the most about at the moment.

Basically, my impression is that the US is set up so that a well functioning 4th estate is crucial to making sure things don't spiral out of control. But this places great demands on the 4th estate that few people now are willing to pay for. Historically this functioned by positive externalities, but that's a sketchy place to be. If we develop better methods of coordination in the future I think it's possible to just coordinate to pay the fees and solve the problem.

Comment by ozziegooen on What are the best tools for recording predictions? · 2020-05-27T09:55:03.151Z · score: 5 (3 votes) · LW · GW

For those reading, the main thing I'm optimizing Foretold for right now, is for forecasting experiments and projects with 2-100 forecasters. The spirit of making "quick and dirty" questions for personal use conflicts a bit with that of making "well thought out and clear" questions for group use. The latter are messy to change, because it would confuse everyone involved.

Note that Foretold does support full probability distributions with the guesstimate-like syntax, which prediction book doesn't. But it's less focused on the quick individual use case in general.

If there are recommendations for simple ways to make it better for individuals; maybe other workflows, I'd be up for adding some support or integrations.

Comment by ozziegooen on What are the externalities of predictions on wars? · 2020-04-20T20:39:21.411Z · score: 3 (2 votes) · LW · GW

[retracted: I read the question too quickly, misunderstood it]

My impression, after some thought and discussion (over the last ~1 year or so), is that people being smarter / predicting better will probably decrease the number of wars and make them less terrible. That said, there are of course tails; perhaps some specific wars could be far worse (one country being much better at destroying another).

As I understand it, many wars in part started due to overconfidence; both sides are overconfident on their odds of success (due to many reasons). If they were properly calibrated, they would more likely partake in immediate trades/consessions or similar, rather than take fights, which are rather risky.

Similarly, I wouldn't expect different AGIs to physically fight each other often at all.

Comment by ozziegooen on [Link] Beyond the hill: thoughts on ontologies for thinking, essay-completeness and forecasting · 2020-02-02T23:25:46.985Z · score: 6 (3 votes) · LW · GW

You need to log in if you want to make predictions. You should be able to see others' predictions without logging in. (At least on Firefox and Chrome)

Note the notebook interface is kind of new and still has some quirks that are getting worked out.

Comment by ozziegooen on Existing work on creating terminology & names? · 2020-01-31T21:11:20.060Z · score: 2 (1 votes) · LW · GW

It looks interesting, but my search shall continue. Seems pretty short and not really on naming. I may order a copy though. Thanks!

Comment by ozziegooen on Existing work on creating terminology & names? · 2020-01-31T21:10:38.548Z · score: 4 (2 votes) · LW · GW

Thanks! I've looked at (2) a bit and some other work on Information Architecture.

I've found it interesting but kind of old-school, it seems to have been a big deal when web tree navigation was a big thing, and to have died down after. It also seems pretty applied; as in there isn't a lot of connection with academic theory in how one could think about these classifications.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-31T11:35:52.631Z · score: 12 (3 votes) · LW · GW

More Narrow Models of Credences

Epistemic Rigor
I'm sure this has been discussed elsewhere, including on LessWrong. I haven't spent much time investigating other thoughts on these specific lines. Links appreciated!

The current model of a classically rational agent assume logical omniscience and precomputed credences over all possible statements.

This is really, really bizarre upon inspection.

First, "logical omniscience" is very difficult, as has been discussed (The Logical Induction paper goes into this).

Second, all possible statements include statements of all complexity classes that we know of (from my understanding of complexity theory). "Credences over all possible statements" would easily include uncountable infinities of credences. One could clarify that even arbitrarily large amounts of computation would not be able to hold all of these credences.

Precomputation for things like this is typically a poor strategy, for this reason. The often-better strategy is to compute things on-demand.

A nicer definition could be something like:

A credence is the result of an [arbitrarily large] amount of computation being performed using a reasonable inference engine.

It should be quite clear that calculating credences based on existing explicit knowledge is a very computationally-intensive activity. The naive Bayesian way would be to start with one piece of knowledge, and then perform a Bayesian update on each next piece of knowledge. The "pieces of knowledge" can be prioritized according to heuristics, but even then, this would be a challenging process.

I think I'd like to see specification of credences that vary with computation or effort. Humans don't currently have efficient methods to use effort to improve our credences, as a computer or agent would be expected to.

Solomonoff's theory of Induction or Logical Induction could be relevant for the discussion of how to do this calculation.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-26T15:12:25.665Z · score: 7 (4 votes) · LW · GW

Intervention dominance arguments for consequentialists

Global Health

There's a fair bit of resistance to long-term interventions from people focused on global poverty, but there are a few distinct things going on here. One is that there could be a disagreement on the use of discount rates for moral reasoning, a second is that the long-term interventions are much more strange.

No matter which is chosen, however, I think that the idea of "donate as much as you can per year to global health interventions" seems unlikely to be ideal upon clever thinking.

For the last few years, the cost-to-save-a-life estimates of GiveWell seem fairly steady. The S&P 500 has not been steady, it has gone up significantly.

Even if you committed to purely giving to global heath, you'd be better off if you generally delayed. It seems quite possible that if every life you would have saved in 2010, you could have saved 2 or more if you would have saved the money and spent it in 2020, with a decently typical investment strategy. (Arguably leverage could have made this much higher.) From what I understand, the one life saved in 2010 would likely not have resulted in one extra life equivalent saved in 2020; the returns per year was likely less than that of the stock market.

One could of course say something like, "My discount rate is over 3-5% per year, so that outweighs this benefit". But if that were true it seems likely that the opposite strategy could have worked. One could have borrowed a lot of money in 2010, donated it, and then spent the next 10 years paying that back.

Thus, it seems conveniently optimal if one's enlightened preferences would suggest not either investing for long periods or borrowing.

EA Saving

One obvious counter to immediate donations would be to suggest that the EA community financially invests money, perhaps with leverage.

While it is difficult to tell if other interventions may be better, it can be simpler to ask if they are dominant; in this case, that means that they predictably increase EA-controlled assets at a rate higher than financial investments would.

A good metaphor could be to consider the finances of cities. Hypothetically, cities could invest much of their earnings near-indefinitely or at least for very long periods, but in practice, this typically isn't key to their strategies. Often they can do quite well by investing in themselves. For instance, core infrastructure can be expensive but predictably lead to significant city revenue growth. Often these strategies area so effective that they issue bonds in order to pay more for this kind of work.

In our case, there could be interventions that are obviously dominant to financial investment in a similar way. An obvious one would be education; if it were clear that giving or lending someone money would lead to predictable donations, that could be a dominant strategy to more generic investment strategies. Many other kinds of community growth or value promotion could also fit into this kind of analysis. Related, if there were enough of these strategies available, it could make sense for loans to be made in order to pursue them further.

What about a non-EA growth opportunity? Say, "vastly improving scientific progress in one specific area." This could be dominant (to investment, for EA purposes) if it would predictably help EA purposes by more than the investment returns. This could be possible. For instance, perhaps a $10mil donation to life extension research[1] could predictably increase $100mil of EA donations by 1% per year, starting in a few years.

One trick with these strategies is that many would fall into the bucket of "things a generic wealthy group could do to increase their wealth"; which is mediocre because we should expect that type of things to be well-funded already. We may also want interventions that differentially change wealth amounts.

Kind of sadly, this seems to suggest that some resulting interventions may not be "positive sum" to all relevant stakeholders. Many of the "positive sum in respect to other powerful interest" interventions may be funded, so the remaining ones could be relatively neutral or zero-sum for other groups.

[1] I'm just using life extension because the argument would be simple, not because I believe it could hold. I think it would be quite tricky to find great options here, as is evidenced by the fact that other very rich or powerful actors would have similar motivations.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-24T12:48:49.988Z · score: 2 (1 votes) · LW · GW

Update: After I wrote this shortform, I did more investigation in Pragmatics and realized most of this was better expressed there.

Comment by ozziegooen on 2018 Review: Voting Results! · 2020-01-24T11:51:36.625Z · score: 2 (1 votes) · LW · GW

Interesting. From the data, it looks like there's a decent linear correlation up to around 150 Karma or so, and then after that the correlation looks more nebulous.

Comment by ozziegooen on 2018 Review: Voting Results! · 2020-01-24T10:40:19.731Z · score: 13 (3 votes) · LW · GW

I'm quite curious how this ordering correlated with the original LessWrong Karma of each post, if that analysis hasn't been done yet. Perhaps I'd be more curious to better understand what a great ordering would be. I feel like there are multiple factors taken into account when voting, and it's also quite possible that the userbase represents multiple clusters that would have distinct preferences.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-21T22:43:36.693Z · score: 2 (1 votes) · LW · GW

One nice thing about cases where the interpretations matter, is that the interpretations are often easier to measure than intent (at least for public figures). Authors can hide or lie about their intent or just never choose to reveal it. Interpretations can be measured using surveys.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-21T22:39:12.332Z · score: 12 (3 votes) · LW · GW

It seems like there are a few distinct kinds of questions here.

  1. You are trying to estimate the EV of a document.
    Here you want to understand the expected and actual interpretation of the document. The intention only matters to how it effects the interpretations.

  2. You are trying to understand the document.
    Example: You're reading a book on probability to understand probability.
    Here the main thing to understand is probably the author intent. Understanding the interpretations and misinterpretations of others is mainly useful so that you can understand the intent better.

  3. You are trying to decide if you (or someone else) should read the work of an author.
    Here you would ideally understand the correctness of the interpretations of the document, rather than that of the intention. Why? Because you will also be interpreting it, and are likely somewhere in the range of people who have interpreted it. For example, if you are told, "This book is apparently pretty interesting, but every single person who has attempted to read it, besides one, apparently couldn't get anywhere with it after spending many months trying", or worse, "This author is actually quite clever, but the vast majority of people who read their work misunderstand it in profound ways", you should probably not make an attempt; unless you are highly confident that you are much better than the mentioned readers.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-21T12:48:53.452Z · score: 4 (2 votes) · LW · GW

Related, there seems to be a decent deal of academic literature on intention vs. interpretation in Art, though maybe less in news and media.

Some other semi-related links:

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-21T12:36:50.865Z · score: 12 (7 votes) · LW · GW

Communication should be judged for expected value, not intention (by consequentialists)

TLDR: When trying to understand the value of information, understanding the public interpretations of that information could matter more than understanding the author's intent. When trying to understand the information for other purposes (like, reading a math paper to understand math), this does not apply.

If I were to scream "FIRE!" in a crowded theater, it could cause a lot of damage, even if my intention were completely unrelated. Perhaps I was responding to a devious friend who asked, "Would you like more popcorn? If yes, should 'FIRE!'".

Not all speech is protected by the First Amendment, in part because speech can be used for expected harm.

One common defense of incorrect predictions is to claim that their interpretations weren't their intentions. "When I said that the US would fall if X were elected, I didn't mean it would literally end. I meant more that..." These kinds of statements were discussed at length in Expert Political Judgement.

But this defense rests on the idea that communicators should be judged on intention, rather than expected outcomes. In those cases, it was often clear that many people interpreted these "experts" as making fairly specific claims that were later rejected by their authors. I'm sure that much of this could have been predicted. The "experts" often definitely didn't seem to be going out of their way to be making their after-the-outcome interpretations clear before-the-outcome.

I think that it's clear that the intention-interpretation distinction is considered highly important by a lot of people, so much so as to argue that interpretations, even predictable ones, are less significant in decision making around speech acts than intentions. I.E. "The important thing is to say what you truly feel, don't worry about how it will be understood."

But for a consequentialist, this distinction isn't particularly relevant. Speech acts are judged on expected value (and thus expected interpretations), because all acts are judged on expected value. Similarly, I think many consequentialists would claim that here's nothing metaphysically unique about communication as opposed to other actions one could take in the world.

Some potential implications:

  1. Much of communicating online should probably be about developing empathy for the reader base, and a sense for what readers will misinterpret, especially if such misinterpretation is common (which it seems to be).
  2. Analyses of the interpretations of communication could be more important than analysis of the intentions of communication. I.E. understanding authors and artistic works in large part by understanding their effects on their viewers.
  3. It could be very reasonable to attempt to map non probabilistic forecasts into probabilistic statements based on what readers would interpret. Then these forecasts can be scored using scoring rules just like those as regular probabilistic statements. This would go something like, "I'm sure that Bernie Sanders will be elected" -> "The readers of that statement seem to think the author applying probability 90-95% to the statement 'Bernie Sanders will win'" -> a brier/log score.

Note: Please do not interpret this statement as attempting to say anything about censorship. Censorship is a whole different topic with distinct costs and benefits.

Comment by ozziegooen on Go F*** Someone · 2020-01-20T23:09:20.631Z · score: 2 (1 votes) · LW · GW

Thanks for the response!

For what it's worth, I predict that this would have gotten more upvotes here at least with different language, though I realize this was not made primarily for LW.

my personal opinion is that LW shouldn't cater to people who form opinions on things before reading them and we should discourage them from hanging out here.

I think this is a complicated issue. I could appreciate where it's coming from and could definitely imagine things going too far in either direction. I imagine that both of us would agree it's a complicated issue, and that there's probably some line somewhere, though we may of course disagree on where specifically it is.

A literal-ish interpretation of your phrase there is difficult for me to interpret. I feel like I start with priors on things all the time. Like, if I know an article comes from The NYTimes vs. The Daily Stormer, that snippet of data itself would give me what seems like useful data. There's a ton of stuff online I choose not to read because it seems to be from sources I can't trust for reasons of source, or a quick read of headline.

Comment by ozziegooen on Go F*** Someone · 2020-01-20T22:59:51.742Z · score: 2 (1 votes) · LW · GW

A bit more thinking;

I would guess that one reason why you had a strong reaction, and/or why several people upvoted you so quickly, was because you/they were worried that my post would be understood by some as "censorship=good" or "LessWrong needs way more policing".

If so, I think that's a great point! It's similar to my original point!

Things get misunderstood all the time.

I tried my best to make my post understandable. I tried my best to condition it so that people wouldn't misinterpret or overinterpret it. But then my post was misunderstood (from what I can tell, unless I'm seriously misunderstanding Ben here) literally happened within 30 minutes.

My attempt provably failed. I'll try harder next time.

Comment by ozziegooen on Go F*** Someone · 2020-01-20T22:39:54.280Z · score: 9 (4 votes) · LW · GW

Did you interpret me to say, "One should be sure that zero readers will feel offended?" I think that would clearly be incorrect. My point was that there are cases where one may believe that a bunch of readers may be offended, with relatively little cost to change things to make that not the case.

For instance, one could make lots of points that use alarmist language to poison the well, where the language is technically correct, but very predictably misunderstood.

I think there is obviously some line. I imagine you would as well. It's not clear to me where that line is. I was trying to flag that I think some of the language in this post may have crossed it.

Apologies if my phrasing was misunderstood. I'll try changing that to be more precise.

Comment by ozziegooen on Go F*** Someone · 2020-01-20T21:53:12.511Z · score: 9 (5 votes) · LW · GW

I think I'm fairly uncomfortable with some of the language in this post being on LessWrong as such. It seems from the other comments that some people find some of the information useful, which is a positive signal. However, there are 36 votes on this, with a net of +12, which is a pretty mixed signal. My impression is that few of the negative voters gave descriptive comments.

I think with any intense language the issue isn't only "Is this effective language to convey the point without upsetting an ideal reader", but also something like, "Given that there is a wide variety of readers, are we sufficiently sure that this will generally not needlessly offend or upset many of them, especially in ways that could easily be improved upon?"

I could imagine casual readers quickly looking at this and assuming it's related to the PUA community or similar groups that have some sketchy connotations.

This presents two challenges. First, anyone who makes this inference may also assume that other writers on LessWrong share similar beliefs to what they think this kind of writing signals to them. Second, it may attract other writing that may be quite bad in ways we definitely don't want.

I would suggest that in the future, posts either don't use such dramatic language here, or in the very least just done as link posts.

I'd be curious if others have takes on this issue; it's definitely possible my intuitions are off here.

Comment by ozziegooen on ACDT: a hack-y acausal decision theory · 2020-01-15T19:33:04.059Z · score: 5 (3 votes) · LW · GW

Nice post! I found the diagrams particularly readable, it makes a lot of sense to me to have them in such a problem.

I'm not very well-read on this sort of work, so feel free to ignore any of the following.

The key question I have is the correctness of the section:

In a sense, ACDT can be seen as anterior to CDT. How do we know that causality exists, and the rules it runs on? From our experience in the world. If we lived in a world where the Newcomb problem or the predictors exist problem were commonplace, then we'd have a different view of causality.

It might seem gratuitous and wrong to draw extra links coming out of your decision node - but it was also gratuitous and wrong to cut all the links that go into your decision node. Drawing these extra arrows undoes some of the damage, in a way that a CDT agent can understand (they don't understand things that cause their actions, but they do understand consequences of their actions).

I don't quite see why the causality is this flexible and arbitrary. I haven't read Causality, but think I get the gist.

It's definitely convenient here to be uncertain about causality. But it would be similarly convenient to have uncertainty about the correct decision theory. A similar formulation could involve a meta-decision-algorithm that has tries different decision algorithms until one produces favorable outcomes. Personally I think I'd be easier to be convinced that acausal decision theory is correct than that a different causal structure is correct.

Semi-related, one aspect of Newcomb's problem that has really confused me is the potential for Omega to have scenarios that favor incorrect beliefs. It would be arbitrary to imagine that Newcomb would offer $1,000 only if it could tell that one believes that "19 + 2 = 20". One could solve that by imagining that the participant should have uncertainty about what "19 + 2" is, trying out multiple options, and seeing which would produce the most favorable outcome.


If it's encountered the Newcomb problem before, and tried to one-box and two-box a few times, then it knows that the second graph gives more accurate predictions

To be clear, I'd assume that the agent would be smart enough to simulate this before actually having it done? The outcome seems decently apparent to me.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-14T20:35:57.680Z · score: 4 (2 votes) · LW · GW

One question around the "Long Reflection" or around "What will AGI do?" is something like, "How bottlenecked will be by scientific advances that we'll need to then spend significant resources on?"

I think some assumptions that this model typically holds are:

  1. There will be decision-relevant unknowns.
  2. Many decision-relevant unkowns will be EV-positive to work on.
  3. Of the decision-relevant unknowns that are EV-positive to work on, these will take between 1% to 99% of our time.

(3) seems quite uncertain to me in the steady state. I believe it makes an intuitive estimate between 2 orders of magnitude, while the actual uncertainty is much higher than that. If this were the case, it would mean:

  1. Almost all possible experiments are either trivial (<0.01% of resources, in total), or not cost-effective.
  2. If some things are cost-effective and still expensive (they will take over 1% of the AGI lifespan), it's likely that they will take 100%+ of the time. Even if they would take 10^10% of the time, in expectation, they could still be EV-positive to pursue. I wouldn't be surprised if there were one single optimal thing like this in the steady-state. So this strategy would look something like, "Do all the easy things, then spend a huge amount of resources on one gigantic-sized, but EV-high challenge."

(This was inspired by a talk that Anders Sandberg gave)

Comment by ozziegooen on Predictors exist: CDT going bonkers... forever · 2020-01-14T18:26:20.793Z · score: 3 (2 votes) · LW · GW

I like this formulation. Personally, I've felt that Newcomb's problem is a bit overly complex and counter-intuitive. Arguably Newcomb's problem with transparent boxes would be the same as regular Newcomb's problem, for instance.

Andrew Critch once mentioned a similar problem around rock-paper-scissors and Bayes. The situation was, "Imagine you are playing a game of rock-paper-scissors against an omega who can near-perfectly predict your actions. What should your estimate be for the winning decisions?" The idea was that a Bayesian would have to admit that one has a 33.33333... + delta chance of winning, and then expect that to win in 33.333333 + delta times, but they would predictably win ~0 times, so this showcases a flaw in Bayes. However, it was claimed that Logical Induction would handle this.

Another game that came to mind from your post is Three-card Monte with a dealer who chose randomly but was really good at reading minds.

I definitely would acknowledge this as a nasty flaw in a Bayesian analysis, but could easily imagine that it's a flaw in the naive use of Bayesian analysis, rather than the ideal.

I was a bit curious about the possibility of imagining what reflective Bayes would look like. Something like,

In the case of rock-paper-scissors, the agent knows that

It could condition on this, making a much longer claim,

One obvious issue that comes up is that the justifications for Bayes lie in axioms of probability that clearly are not effectively holding up here. I'd assume that the probability space of some outcomes is not at all a proper measure, as the sum doesn't equal 1.

Comment by ozziegooen on Are "superforecasters" a real phenomenon? · 2020-01-13T12:11:43.397Z · score: 8 (2 votes) · LW · GW

Fair point. I'm sure you expect some correlation between the use of reasonable incentive structures and useful updating though. It may not be perfect, but I'd be surprised if it were 0.

Comment by ozziegooen on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-13T08:23:41.641Z · score: 3 (2 votes) · LW · GW

I'm quite excited about things like this. This specific proposal seems reasonable to me, I definitely prefer it over A!B syntax, which I've found confusing.

I previously pondered possible annotations to express uncertainty.

I'm quite curious when it will be possible to use ML systems to make automatic annotations. I could imagine some possible browser extensions that could really augment reading ability and clarity.

Comment by ozziegooen on Are "superforecasters" a real phenomenon? · 2020-01-12T11:53:48.661Z · score: 10 (3 votes) · LW · GW

I think one really important decision-relevant question is:

"Do we need to have forecasters spend years forecasting questions before we can get a good sense of how good they are, or can we get most of that information with a quick (<1 week) test?"

My impression is that the Good Judgement Project used several tests to attempt to identify forecasters, but the tests didn't predict the superforecasters as well as what some may have desired.

Do you think that almost all of this can be explained either by:

  1. Diligence to the questions, similar to your example of the MMORPG?
  2. Other simple things that we may be able to figure out in the next few years?

If so, I imagine the value of being a "superforecaster" would go down a bit, but the value of being "a superforecaster in expectation" would go up.

Comment by ozziegooen on CFAR Participant Handbook now available to all · 2020-01-12T11:50:13.804Z · score: 2 (1 votes) · LW · GW

I personally kind of like pdfs. PDF Expert on the ipad is pretty great; it crops things if you want, and I find pdfs good for annotation. My impression is that a lot of academics like pdfs for similar reasons (there are at least some valid reasons why they are popular).

There are also other programs that read pdfs aloud, which are kinda nice, though I'm sure similar exists for epub/mobi.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-10T11:21:24.724Z · score: 2 (1 votes) · LW · GW

Yep, I would generally think so.

I was doing what may be a poor steelman of my assumptions of how others would disagree; I don't have a great sense of what people who would disagree would say at this point.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T22:55:37.777Z · score: 3 (2 votes) · LW · GW

One extreme case would be committing suicide because your secret is that important.

A less extreme case may be being OK with forgetting information; you're losing value, but the cost to maintain it wouldn't be worth it. (In this case the information is positive though)

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T22:17:54.758Z · score: 2 (1 votes) · LW · GW

Good points!

Also, thanks for the link, that's pretty neat.

One thing you could do instead of scoring people against expert assessments is also potentially score people against the final aggregate and extremized distribution.

I think that an efficient use of expert assessments would be for them to see the aggregate, and then basically adjust that as is necessary, but to try to not do much original research. I just wrote a more recent shortform post about this.

One issue with any framework like this is that general calibration may be very different than calibration at the tails.

I think that we can get calibration to be as good as experts can figure out, and that could be enough to be really useful.

Comment by ozziegooen on FactorialCode's Shortform · 2020-01-08T21:19:27.151Z · score: 2 (1 votes) · LW · GW

I've also been thinking about this. I think link-posts are a good first-step and maybe we should make more link-posts for papers we find interesting. But one issue that I have with LW is that it's pretty blog-like (similar to Reddit and HackerNews); so for some of these things it could be difficult for old papers to accumulate reviews and comments over a long period of people reading them.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T21:14:10.904Z · score: 2 (1 votes) · LW · GW

From a conceptual perspective, we expect the tails to be dominated by unknown unknowns and black swans.

I'm not sure. The reasons things happen at the tails typically fall into categories that could be organized to be a small set.

For instance:

  • The question wasn't understood correctly.
  • A significant exogenous event happened.

But, as we do a bunch of estimates, we could get empirical data about these possibilities, and estimate the potentials for future tails.

This is a bit different to what I was mentioning, which was more about known but small risks. For instance, the "amount of time I spend on my report next week" may be an outlier if I die. But the chance of serious accident or death can be estimated decently well enough. These are often repeated known knowns.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T19:55:47.765Z · score: 2 (1 votes) · LW · GW

Prediction evaluations may be best when minimally novel

Imagine a prediction pipeline is resolved with a human/judgemental evaluation. For instance, a group today starts predicting what a trusted judge 10 years from now will say for the question, "How much counterfactual GDP benefit did policy X make, from 2020-2030?"

So, there are two stages:

  1. Prediction
  2. Evaluation

One question for the organizer of such a system is how many resources to delegate to the prediction step vs. the evaluation step. It could be expensive to both pay for predictors and evaluators, so it's not clear how to weigh these steps against each other.

I've been suspecting that there are methods to be stingy with regards to the evaluators, and I have a better sense now why that is the case.

Imagine a model where the predictors gradually discover information I_predictors about I_total, the true ideal information needed to make this estimate. Imagine that they are well calibrated, and use the comment sections to express their information when predicting.

Later the evaluator comes by. Because they could read everything so far, they start with I_predictors. They can use this to calculate Prediction(I_predictors), although this should have already been estimated from the previous predictors (a la the best aggregate).

At this point the evaluator can choose to attempt to get more information, I_evaluation > I_predictors. However, if they do, the resulting probability distribution would be predicted by Prediction(I_predictors). Insofar as the predictors are concerned, the expected value of Prediction(I_evaluation) should be the same as that of Prediction(I_predictors), assuming that Prediction(I_predictors) is calibrated; except for the fact that it will be have more risk/randomness. Risk is generally not a desirable property. I've written about similar topics in this post.

Therefor, the predictors should generally prefer Prediction(I_predictors) to Prediction(I_evaluator), as long as everyone's predictions are properly calibrated. This difference shouldn't generally lead to a difference of predictions from them unless a complex or odd scoring rule were used.

Of course, calibration can't be taken for granted. So pragmatically, the evaluator would likely have to deal with issues of calibration.

This setup also assumed that maximally useful comments are made available to evaluator. I think predictors will generally want the evaluator to see much of their information, as it would in general support their sides.

A relaxed version of this may be that the evaluators' duty would be to get approximately all the information that the predictors had access to, but more is not necessary.

Note that this model is only interested in the impact of good evaluation on the predictions. Evaluation also would lead to "externalities"; information that would be useful in other ways as well. This information isn't included here, but I'm fine with that. I think we should generally expect predictors to be more cost-effective than evaluators at doing "prediction work" (i.e. the main reason we have separated anyway!)

The role of evaluation could be to ensure that predictions were reasonably calibrated and that the aggregation thus did a decent job. Evaluators shouldn't don't have to outperform the aggregate, if that requires outside information from what was used in the predictions.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T13:33:49.629Z · score: 2 (1 votes) · LW · GW

Yep, this way would basically be much more information-dense, with all the benefits that comes from that.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T12:45:31.678Z · score: 2 (1 votes) · LW · GW

One nice thing about adjustments is that they can be applied to many forecasts. Like, I can estimate the adjustment for someone's [list of 500 forecasts] without having to look at each one.

Over time, I assume that there would be heuristics for adjustments, like, "Oh, people of this reference class typically get a +20% adjustment", similar to margins of error in engineering.

That said, these are my assumptions, I'm not sure what forecasters will find to be the best in practice.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T12:40:33.009Z · score: 2 (1 votes) · LW · GW

In that paragraph, did you mean to say "findings_i is correct"? Good point, I think you're right, I changed the text accordingly.

The main point I was getting at is that the phrases:

  1. Experiments are important to perform.
  2. Predictors cannot decently predict the results of experiments unless they have gigantic amounts of time.

Are a bit contradictory. You can choose either, but probably not both.

Likewise, I'd expect that experiments that are easier to predict are ones that are more useful, which is more convenient than the other alternative.

I think generally we will want to estimate importance/generality of experiments separate from their predictability.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T12:11:52.701Z · score: 2 (1 votes) · LW · GW

I'd agree that the first one is generally pretty separated from common reality, but think it's a useful thought experiment.

I was originally thinking of this more in terms of "removing useful information" than "removing expected-harmful information", but good point; the latter could be interesting too.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T12:07:02.043Z · score: 2 (1 votes) · LW · GW

There's some related academic work around this here:

They don't specifically focus on utilitarians, but the arguments are still relevant.

Also, this post is relevant:

Comment by ozziegooen on 2020's Prediction Thread · 2020-01-08T11:51:01.012Z · score: 4 (2 votes) · LW · GW

Sure. I didn't know about this post when I wrote this, but it seems similar enough.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T11:46:08.645Z · score: 2 (1 votes) · LW · GW

This sounds pretty reasonable to me; it sounds like you're basically trying to maximize expected value, but don't always trust your initial intuitions, which seems quite reasonable.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T11:44:58.589Z · score: 2 (1 votes) · LW · GW

I'd generally say that, but wouldn't be surprised if there were some who disagreed; who's argument would be something like what-to-me would sound like a modification of utilitarianism, [utilitarianism+epistemic-terminal-values].

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T11:43:25.321Z · score: 2 (1 votes) · LW · GW

It's hard to imagine what resource is more valuable than knowledge and epistemics

I think my thinking is that for utilitarians, these are generally instrumental, not terminal values. Often they're pretty important instrumental values, but this still would mean that they could be traded off in respect to the terminal values. Of course, if they are "highly important" instrumental values, then something very large would have to be offered for a trade to be worth it. (total annihilation being one example)

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-08T11:40:17.541Z · score: 2 (1 votes) · LW · GW

I was thinking the former, but I guess the latter could also be relevant/count. It seems like there's no strict cut-off. I'd expect a utilitarian to accept trade-offs against all these kinds of knowledge, conditional on the total expected value being positive.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-07T16:59:19.015Z · score: 2 (1 votes) · LW · GW

Basically, information that can be handled in "value of information" style calculations. So, if I learn information such that my accuracy of understanding the world increases, my knowledge is increased. For instance, if I learn the names of everyone in my extended family.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-07T11:30:18.052Z · score: 5 (3 votes) · LW · GW

Would anyone here disagree with the statement:

Utilitarians should generally be willing to accept losses of knowledge / epistemics for other resources, conditional on the expected value of the trade being positive.

Comment by ozziegooen on ozziegooen's Shortform · 2020-01-07T11:26:47.248Z · score: 4 (2 votes) · LW · GW

I feel like a decent alternative to a spiritual journey would be an epistemic journey.

An epistemic journey would basically involve something like reading a fair bit of philosophy and other thought, thinking, and becoming less wrong about the world.

Comment by ozziegooen on Machine Learning Can't Handle Long-Term Time-Series Data · 2020-01-05T10:11:16.692Z · score: 6 (4 votes) · LW · GW

This article made it to Hacker News, where it got a few comments.

Comment by ozziegooen on [Part 1] Amplifying generalist research via forecasting – Models of impact and challenges · 2020-01-04T22:03:25.842Z · score: 2 (1 votes) · LW · GW

Thanks! I really do appreciate the thoughts & feedback in general, and am quite happy to answer questions. There's a whole lot we haven't written up yet, and it's much easier for me to reply to things than lay everything out.