Comment by esrogs on Risks from Learned Optimization: Introduction · 2019-06-01T23:33:37.620Z · score: 6 (3 votes) · LW · GW

Got it, that's helpful. Thank you!

Comment by esrogs on Risks from Learned Optimization: Introduction · 2019-06-01T21:11:23.857Z · score: 20 (7 votes) · LW · GW

Very clear presentation! As someone outside the field who likes to follow along, I very much appreciate these clear conceptual frameworks and explanations.

I did however get slightly lost in section 1.2. At first reading I was expecting this part:

which we will contrast with the outer alignment problem of eliminating the gap between the base objective and the intended goal of the programmers.

to say, "... gap between the behavioral objective and the intended goal of the programmers." (In which case the inner alignment problem would be a subcomponent of the outer alignment problem.)

On second thought, I can see why you'd want to have a term just for the problem of making sure the base objective is aligned. But to help myself (and others who think similarly) keep this all straight, do you have a pithy term for "the intended goal of the programmers" that's analogous to base objective, mesa objective, and behavioral objective?

Would meta objective be appropriate?

(Apologies if my question rests on a misunderstanding or if you've defined the term I'm looking for somewhere and I've missed it.)

Comment by esrogs on Open Thread April 2019 · 2019-04-28T19:50:25.182Z · score: 2 (1 votes) · LW · GW

Apparently the author is a science writer (makes sense), and it's his first book:

I’m a freelance science writer. Until January 2018 I was science writer for BuzzFeed UK; before that, I was a comment and features writer for the Telegraph, having joined in 2007. My first book, The Rationalists: AI and the geeks who want to save the world, for Weidenfeld & Nicolson, is due to be published spring 2019. Since leaving BuzzFeed, I’ve written for the Times, the i, the Telegraph, UnHerd, politics.co.uk, and elsewhere.

https://tomchivers.com/about/

Comment by esrogs on Open Thread April 2019 · 2019-04-28T18:37:49.268Z · score: 16 (5 votes) · LW · GW

Someone wrote a book about us:

Overall, they have sparked a remarkable change.  They’ve made the idea of AI as an existential risk mainstream; sensible, grown-up people are talking about it, not just fringe nerds on an email list.  From my point of view, that’s a good thing.  I don’t think AI is definitely going to destroy humanity.  But nor do I think that it’s so unlikely we can ignore it.  There is a small but non-negligible probability that, when we look back on this era in the future, we’ll think that Eliezer Yudkowsky and Nick Bostrom  — and the SL4 email list, and LessWrong.com — have saved the world.  If Paul Crowley is right and my children don’t die of old age, but in a good way — if they and humanity reach the stars, with the help of a friendly superintelligence — that might, just plausibly, be because of the Rationalists.

https://marginalrevolution.com/marginalrevolution/2019/04/the-ai-does-not-hate-you.html

H/T https://twitter.com/XiXiDu/status/1122432162563788800

Comment by esrogs on Book review: The Sleepwalkers by Arthur Koestler · 2019-04-25T02:40:33.084Z · score: 7 (4 votes) · LW · GW
Figuring out what's up with that seems like a major puzzle of our time.

Would be curious to hear more about your confusion and why it seems like such a puzzle. Does "when you aggregate over large numbers of things, complex lumpiness smooths out into boring sameness" not feel compelling to you?

If not, why not? Maybe you can confuse me too ;-)

Comment by esrogs on The Principle of Predicted Improvement · 2019-04-25T02:35:19.607Z · score: 15 (5 votes) · LW · GW
E[P(H|D)]≥E[P(H)]
In English the theorem says that the probability we should expect to assign to the true value of H after observing the true value of D is greater than or equal to the expected probability we assign to the true value of H before observing the value of D.

I have a very basic question about notation -- what tells me that H in the equation refers to the true hypothesis?

Put another way, I don't really understand why that equation has a different interpretation than the conservation-of-expected-evidence equation: E[P(H=hi|D)]=P(H=hi).

In both cases I would interpret it as talking about the expected probability of some hypothesis, given some evidence, compared to the prior probability of that hypothesis.

Comment by esrogs on Alignment Newsletter One Year Retrospective · 2019-04-12T17:09:29.049Z · score: 6 (3 votes) · LW · GW
I think I've commented on your newsletters a few times, but haven't comment more because it seems like the number of people who would read and be interested in such a comment would be relatively small, compared to a comment on a more typical post.

I am surprised you think this. Don't the newsletters tend to be relatively highly upvoted? They're one of the kinds of links that I always automatically click on when I see them on the LW front page.

Maybe I'm basing this too much on my own experience, but I would love to see more discussion on the newsletter posts.

Comment by esrogs on Degrees of Freedom · 2019-04-03T02:00:26.862Z · score: 9 (5 votes) · LW · GW

For freedom-as-arbitrariness, see also: Slack

Comment by esrogs on Degrees of Freedom · 2019-04-03T01:18:59.695Z · score: 11 (5 votes) · LW · GW
If your car was subject to a perpetual auction and ownership tax as Weyl proposes, bashing your car to bits with a hammer would cost you even if you didn’t personally need a car, because it would hurt the rental or resale value and you’d still be paying tax.

I don't think this is right. COST stands for "Common Ownership Self-Assessed Tax". The self-assessed part refers to the idea that you personally state the value you'd be willing to sell the item for (and pay tax on that value). Once you've destroyed the item, presumably you'd be willing to part with the remains for a lower price, so you should just re-state the value and pay a lower tax.

It's true that damaging the car hurts the resale value and thus costs you (in terms of your material wealth), but this would be true whether or not you were living under a COST regime.

Comment by esrogs on How good is a human's gut judgement at guessing someone's IQ? · 2019-04-03T00:19:17.970Z · score: 5 (3 votes) · LW · GW
Whatever ability IQ tests and math tests measure, I believe that lacking that ability doesn’t have any effect on one’s ability to make a good social impression or even to “seem smart” in conversation.

That section of Sarah's post jumped out at me too, because it seemed to be the opposite of my experience. In my (limited, subject-to-confirmation-bias) experience, how smart someone seems to me in conversation seems to match pretty well with how they did on standardized tests (or other measures of academic achievement). Obviously not perfectly, but way way better than chance.

Comment by esrogs on How good is a human's gut judgement at guessing someone's IQ? · 2019-04-03T00:09:16.628Z · score: 4 (2 votes) · LW · GW
I would also expect that courtesy of things like Dunning-Kruger, people towards the bottom will be as bad at estimating IQ as they are competence at any particular thing.

FWIW, the original Dunning-Kruger study did not show the effect that it's become known for. See: https://danluu.com/dunning-kruger/

In particular:

In two of the four cases, there's an obvious positive correlation between perceived skill and actual skill, which is the opposite of the pop-sci conception of Dunning-Kruger.
Comment by esrogs on Unconscious Economies · 2019-03-28T17:42:56.240Z · score: 10 (3 votes) · LW · GW

I'm not totally sure I'm parsing this sentence correctly. Just to clarify, "large firm variation in productivity" means "large variation in the productivity of firms" rather than "variation in the productivity of large firms", right?

Also, the second part is saying that on average there is productivity growth across firms, because the productive firms expand more than the less productive firms, yes?

Comment by esrogs on What failure looks like · 2019-03-19T17:41:21.784Z · score: 2 (1 votes) · LW · GW

Not sure exactly what you mean by "numerical simulation", but you may be interested in https://ought.org/ (where Paul is a collaborator), or in Paul's work at OpenAI: https://openai.com/blog/authors/paul/ .

Comment by esrogs on UBI for President · 2019-03-17T21:15:21.215Z · score: 2 (1 votes) · LW · GW
Just had a call with Nick Bostrom who schooled me on AI issues of the future. We have a lot of work to do.

https://twitter.com/andrewyangvfa/status/1103352317221445634

Comment by esrogs on UBI for President · 2019-03-17T21:14:37.347Z · score: 2 (1 votes) · LW · GW

This same candidate (whom the markets currently give a 5% chance of being the Democratic nominee) also wants to create a cabinet-level position to monitor emerging technology, especially AI:


Advances in automation and Artificial Intelligence (AI) hold the potential to bring about new levels of prosperity humans have never seen. They also hold the potential to disrupt our economies, ruin lives throughout several generations, and, if experts such as Stephen Hawking and Elon Musk are to be believed, destroy humanity.

...

As President, I will…
* Create a new executive department – the Department of Technology – to work with private industry and Congressional leaders to monitor technological developments, assess risks, and create new guidance. The new Department would be based in Silicon Valley and would initially be focused on Artificial Intelligence.
* Create a new Cabinet-level position of Secretary of Technology who will be tasked with leading the new Department.
* Create a public-private partnership between leading tech firms and experts within government to identify emerging threats and suggest ways to mitigate those threats while maximizing the benefit of technological innovation to society.

https://www.yang2020.com/policies/regulating-ai-emerging-technologies/

Comment by esrogs on Active Curiosity vs Open Curiosity · 2019-03-16T08:15:30.697Z · score: 2 (1 votes) · LW · GW

It seems to me that perhaps the major difference between active/concentrated curiosity and open/diffuse curiosity is how much of an expectation you have that there's one specific piece of information you could get that would satisfy the curiosity. (And for this reason the "concentrated" and "diffuse" labels do seem somewhat apt to me.)

Active/concentrated curiosity is focused on finding the answer to a specific question, while open/diffuse curiosity seeks to explore and gain understanding. (And that exploration may or may not start out with its attention on a single object/emotion/question.)

Comment by esrogs on Verifying vNM-rationality requires an ontology · 2019-03-13T21:47:12.152Z · score: 5 (3 votes) · LW · GW

See also my comment here on non-exploitability.

Comment by esrogs on Verifying vNM-rationality requires an ontology · 2019-03-13T21:39:02.770Z · score: 2 (1 votes) · LW · GW

Nitpick: I think the intro example would be clearer if there were explicit numbers of grapes/oranges rather than "some". Nothing is surprising about the original story if Beatriz got more oranges from Deion than she gave up to Callisto. (Or gave away fewer grapes to Deion than she received from Callisto.)

Comment by esrogs on Karma-Change Notifications · 2019-03-02T03:25:21.044Z · score: 29 (12 votes) · LW · GW

Unless I missed it, neither this comment nor the main post explains why you ultimately decided in favor of karma notifications. You've listed a bunch of cons -- I'm curious what the pros were.

Was it just an attempt to achieve this?

I want new users who show up on the site to feel rewarded when they engage with content
Comment by esrogs on UBI for President · 2019-02-18T00:51:49.464Z · score: 2 (1 votes) · LW · GW

Great long-form interview with Andrew Yang here: Joe Rogan Experience #1245 - Andrew Yang.

Comment by esrogs on Why do you reject negative utilitarianism? · 2019-02-14T19:36:58.335Z · score: 12 (4 votes) · LW · GW

Did you make any update regarding the simplicity / complexity of value?

My impression is that theoretical simplicity is a major driver of your preference for NU, and also that if others (such as myself) weighed theoretical simplicity more highly that they would likely be more inclined towards NU.

In other words, I think theoretical simplicity may be a double crux in the disagreements here about NU. Would you agree with that?

Comment by esrogs on Why do you reject negative utilitarianism? · 2019-02-14T19:27:58.309Z · score: 11 (7 votes) · LW · GW

Meta-note: I am surprised by the current karma rating of this question. At present, it is sitting at +9 points with 7 votes, but it would be at +2 with 6 votes w/o my strong upvote.

To those who downvoted, or do not feel inclined to upvote -- does this question not seem like a good use of LW's question system? To me it seems entirely on-topic, and very much the kind of thing I would want to see here. I found myself disagreeing with much of the text, but it seemed to be an honest question, sincerely asked.

Was it something about the wording (either of the headline or the explanatory text) that put you off?

Comment by esrogs on On Long and Insightful Posts · 2019-02-13T22:08:31.583Z · score: 6 (4 votes) · LW · GW

Relatedly: shorter articles don't need to be as well-written and engaging for me to actually read to the end of them.

I suspect, though, that there is wide variation in willingness to read long posts, perhaps explained (in part) by reading speed.

Comment by esrogs on Why do you reject negative utilitarianism? · 2019-02-11T21:03:00.320Z · score: 11 (5 votes) · LW · GW
If the rationality and EA communities are looking for a unified theory of value

Are they? Many of us seem to have accepted that our values are complex.

Absolute negative utilitarianism (ANU) is a minority view despite the theoretical advantages of terminal value monism (suffering is the only thing that motivates us “by itself”) over pluralism (there are many such things). Notably, ANU doesn’t require solving value incommensurability, because all other values can be instrumentally evaluated by their relationship to the suffering of sentient beings, using only one terminal value-grounded common currency for everything.

This seems like an argument that it would be convenient if our values were simple. This does not seem like strong evidence that they actually are simple. (Though I grant that you could make an argument that it might be better to try to achieve only part of what we value if we're much more likely to be successful that way.)

Comment by esrogs on The Case for a Bigger Audience · 2019-02-10T07:04:05.579Z · score: 3 (2 votes) · LW · GW

FWIW, I was thinking of the related relationship as a human-defined one. That is, the author (or someone else?) manually links another question as related.

Comment by esrogs on The Case for a Bigger Audience · 2019-02-10T01:25:22.828Z · score: 5 (3 votes) · LW · GW
Q&A in particular is something that I can imagine productively scaling to a larger audience, in a way that actually causes the contributions from the larger audience to result in real intellectual progress.

Do you mean scaling it as is, or in the future?

I think there's a lot of potential to innovate on the Q&A system, and I think it'd be valuable to make progress on that before trying to scale. In particular, I'd like to see some method of tracking (or taking advantage of) the structure behind questions -- something to do with how they're related to each other.

Maybe this is as simple as marking two questions as "related" (as I think you and I have discussed offline). Maybe you'd want more fine-grained relationships.

It'd also be cool to have some way of quickly figuring out what the major open questions are in some area (e.g. IDA, or value learning), or maybe what specific people consider to be important open questions.

Comment by esrogs on The Case for a Bigger Audience · 2019-02-10T01:09:54.504Z · score: 10 (6 votes) · LW · GW
Have any posts from LW 2.0 generated new conceptual handles for the community like "the sanity waterline"? If not, maybe it's because they just aren't reaching a big enough audience.

Doesn't this get the causality backwards? I'm confused about the model that would generate this hypothesis.

One way I can imagine good concepts not taking root in "the community" is if not enough of the community is reading the posts. But then why would (most of) the prescriptions seem to be about advertising to the outside world?

Comment by esrogs on When should we expect the education bubble to pop? How can we short it? · 2019-02-10T00:37:52.441Z · score: 2 (1 votes) · LW · GW

And the stories of their students are heartwarming.

Comment by esrogs on When should we expect the education bubble to pop? How can we short it? · 2019-02-10T00:30:01.943Z · score: 4 (2 votes) · LW · GW

Btw, Lambda School twitter is fun to follow. They're doing some impressive stuff.

Comment by esrogs on When should we expect the education bubble to pop? How can we short it? · 2019-02-09T23:00:13.285Z · score: 14 (5 votes) · LW · GW
2) Which assets will be more scarce/in demand as that happens? Are there currently available opportunities for "shorting" the education bubble and invest in ways which will yield profit when it pops?

Vocational schools seem like a reasonable bet. In particular something like Lambda School, where they've aligned incentives by tying tuition to alumni income.

VCs seem to agree, pouring in $14MM in a series A in October 2018, followed by an additional $30MM in a series B just 3 months later.

Comment by esrogs on Conclusion to the sequence on value learning · 2019-02-03T23:25:26.988Z · score: 25 (10 votes) · LW · GW

It seems to me that perhaps your argument about expected utility maximization being a trivial property extends back one step previous in the argument, to non-exploitability as well.

AlphaZero is better than us at chess, and so it is non-exploitable at chess (or you might say that being better at chess is the same thing as being non-exploitable at chess). If that's true, then it must also appear to us to be an expected utility maximizer. But notably the kind of EU-maximizer that it must appear to be is: one whose utility function is defined in terms of chess outcomes. AlphaZero *is* exploitable if we're secretly playing a slightly different game, like how-many-more-pawns-do-I-have-than-my-opponent-after-twenty-moves, or the game don't-get-unplugged.

Going the other direction, from EU-maximization to non-exploitability, we can point out that any agent could be thought of as an EU-maximizer (perhaps with a very convoluted utility function), and if it's very competent w.r.t. its utility function, then it will be non-exploitable by us, w.r.t. outcomes related to its utility function.

In other words, non-exploitability is only meaningful with respect to some utility function, and is not a property of "intelligence" or "competence" in general.

Would you agree with this statement?

Comment by esrogs on How does Gradient Descent Interact with Goodhart? · 2019-02-03T20:17:00.126Z · score: 2 (1 votes) · LW · GW
when everything that can go wrong is the agent breaking the vase, and breaking the vase allows higher utility solutions

What does "breaking the vase" refer to here?

I would assume this is an allusion to the scene in The Matrix with Neo and the Oracle (where there's a paradox about whether Neo would have broken the vase if the Oracle hadn't said, "Don't worry about the vase," causing Neo to turn around to look for the vase and then bump into it), but I'm having trouble seeing how that relates to sampling and search.

Comment by esrogs on How does Gradient Descent Interact with Goodhart? · 2019-02-03T19:57:11.844Z · score: 2 (1 votes) · LW · GW

For the parenthetical in Proposed Experiment #2,

or you can train a neural net to try to copy U

should this be "try to copy V", since V is what you want a proxy for, and U is the proxy?

Comment by esrogs on Drexler on AI Risk · 2019-02-01T19:43:37.643Z · score: 4 (2 votes) · LW · GW
As I was writing the last few paragraphs, and thinking about Wei Dei's objections, I found it hard to clearly model how CAIS would handle the cancer example.

This link appears to be broken. It directs me to https://www.lesswrong.com/posts/x3fNwSe5aWZb5yXEG/reframing-superintelligence-comprehensive-ai-services-as/comment/gMZes7XnQK8FHcZsu, which does not seem to exist.

Replacing the /comment/ part with a # gives https://www.lesswrong.com/posts/x3fNwSe5aWZb5yXEG/reframing-superintelligence-comprehensive-ai-services-as#gMZes7XnQK8FHcZsu, which does work.

(Also it should be "Dai", not "Dei".)

Comment by esrogs on Applied Rationality podcast - feedback? · 2019-02-01T02:21:15.041Z · score: 11 (8 votes) · LW · GW
you should actually first try to integrate each technique and get a sense of whether it worked for you (or why it did not).

This could actually be the theme of the podcast. "Each week I try to integrate one technique and then report on how it went."

Sounds more interesting than just an explanation of what the technique is.

Comment by esrogs on Masculine Virtues · 2019-01-31T12:00:00.124Z · score: 20 (7 votes) · LW · GW

I wanted to get a better sense of the risk, so here is some arithmetic.

Putting together one of the quotes above:

An estimated 300,000 sport-related traumatic brain injuries, predominantly concussions, occur annually in the United States.

And this bit from the recommended Prognosis section:

Most TBIs are mild and do not cause permanent or long-term disability; however, all severity levels of TBI have the potential to cause significant, long-lasting disability. Permanent disability is thought to occur in 10% of mild injuries, 66% of moderate injuries, and 100% of severe injuries.

And this bit from the Epidemiology section:

a US study found that moderate and severe injuries each account for 10% of TBIs, with the rest mild.

We get that there are 300k sport-related TBI's per year in the US, and of those, 240k are mild, 30k are moderate, and 30k are severe. Those severity levels together result in 24k + 20k + 30k ~= 75k cases of permanent disability per year.

To put that in perspective, we can compare to another common activity that has potential to cause harm:

In 2010, there were an estimated 5,419,000 crashes, 30,296 of with fatalities, killing 32,999, and injuring 2,239,000.

If we say that a fatality and a permanent disability due to brain injury are the same order of magnitude of badness, this suggests that sports and traveling by car expose the average (as in mean, not median) American to roughly the same level of risk.

Would be interesting to dig deeper to see how much time Americans spend in cars vs playing sports on average (and then you'd also want to look at the benefits you get from each), but I'll stop here for now.

Comment by esrogs on [Link] Did AlphaStar just click faster? · 2019-01-29T04:41:53.078Z · score: 2 (1 votes) · LW · GW
but this seems like reason to doubt that AI has surpassed human strategy in StarCraft

I think Charlie might be suggesting that AlphaStar would be superior to humans, even with only human or sub-human APM, because the precision of those actions would still be superhuman, even if the total number was slightly subhuman:

the micro advantage for 98% of the game isn't because it's clicking faster, its clicks are just better

This wouldn't necessarily mean that AlphaStar is better at strategy.

Comment by esrogs on [Link] Did AlphaStar just click faster? · 2019-01-28T22:48:37.267Z · score: 7 (6 votes) · LW · GW
"Does perfect stalker micro really count as intelligence?"

Love this bit.

the evidence is pretty strong that AlphaStar (at least the version without attention that just perceived the whole map) could beat humans under whatever symmetric APM cap you want

This does not seem at all clear to me. Weren't all the strategies using micro super-effectively? And apparently making other human-detectable mistakes? Seems possible that AlphaStar would win anyway without the micro, but not at all certain.

Comment by esrogs on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-26T21:11:19.762Z · score: 8 (5 votes) · LW · GW

Interesting analysis here:

I will try to make a convincing argument for the following:
1. AlphaStar played with superhuman speed and precision.
2. Deepmind claimed to have restricted the AI from performing actions that would be physically impossible to a human. They have not succeeded in this and most likely are aware of it.
3. The reason why AlphaStar is performing at superhuman speeds is most likely due to it's inability to unlearn the human players tendency to spam click. I suspect Deepmind wanted to restrict it to a more human like performance but are simply not able to. It's going to take us some time to work our way to this point but it is the whole reason why I'm writing this so I ask you to have patience.
Comment by esrogs on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-26T19:38:02.994Z · score: 9 (6 votes) · LW · GW
I think it's quite possible that when they instituted the cap they thought it was fair, however from the actual gameplay it should be obvious to anyone who is even somewhat familiar with Starcraft II (e.g., many members of the AlphaStar team) that AlphaStar had a large advantage in "micro", which in part came from the APM cap still allowing superhumanly fast and accurate actions at crucial times. It's also possible that the blogpost and misleading APM comparison graph were written by someone who did not realize this, but then those who did realize should have objected to it and had it changed after they noticed.

It's not so obvious to me that someone who realizes that AlphaStar is superior at "micro" should have objected to those graphs.

Think about it like this -- you're on the DeepMind team, developing AlphaStar, and the whole point is to make it superhuman at StarCraft. So there's going to be some part of the game that it's superhuman at, and to some extent this will be "unfair" to humans. The team decided to try not to let AlphaStar have "physical" advantages, but I don't see any indication that they explicitly decided that it should not be better at "micro" or unit control in general, and should only win on "strategy".

Also, separating "micro" from "strategy" is probably not that simple for a model-free RL system like this. So I think they made a very reasonable decision to focus on a relatively easy-to-measure APM metric. When the resulting system doesn't play exactly as humans do, or in a way that would be easy for humans to replicate, to me it doesn't seem so-obvious-that-you're-being-deceptive-if-you-don't-notice-it that this is "unfair" and that you should go back to the drawing board with your handicapping system.

It seems to me that which ways for AlphaStar to be superhuman are "fair" or "unfair" is to some extent a matter of taste, and there will be many cases that are ambiguous. To give a non "micro" example -- suppose AlphaStar is able to better keep track of exactly how many units its opponent has (and at what hit point levels) throughout the game, than a human can, and this allows it to make just slightly more fine-grained decisions about which units it should produce. This might allow it to win a game in a way that's not replicable by humans. It didn't find a new strategy -- it just executed better. Is that fair or unfair? It feels maybe less unfair than just being super good at micro, but exactly where the dividing line is between "interesting" and "uninteresting" ways of winning seems not super clear.

Of course, now that a much broader group of StarCraft players has seen these games, and a consensus has emerged that this super-micro does not really seem fair, it would be weird if DeepMind did not take that into account for its next release. I will be quite surprised if they don't adjust their setup to reduce the micro advantage going forward.

Comment by esrogs on Vote counting bug? · 2019-01-24T00:03:43.813Z · score: 2 (1 votes) · LW · GW

Fixed link here.

Comment by esrogs on Disentangling arguments for the importance of AI safety · 2019-01-22T01:23:49.885Z · score: 6 (3 votes) · LW · GW

To me the difference is that when I read 5 I'm thinking about people being careless or malevolent, in an everyday sense of those terms, whereas when I read 4 I'm thinking about how maybe there's no such thing as a human who's not careless or malevolent, if given enough power and presented with a weird enough situation.

Comment by esrogs on Towards formalizing universality · 2019-01-16T21:27:49.665Z · score: 4 (2 votes) · LW · GW
Simple caricatured examples:
C might propose a design for a computer that has a backdoor that an attacker can use to take over the computer. But if this backdoor will actually be effective, then A[C] will know about it.
C might propose a design that exploits a predictable flaw in A's reasoning (e.g. overlooking consequences of a certain kind, being overly optimistic about some kinds of activities, incorrectly equating two importantly different quantities...). But then A[C] will know about it, and so if A[C] actually reasons in that way then (in some sense) it is endorsed.

These remind me of Eliezer's notions of Epistemic and instrumental efficiency, where the first example (about the backdoor) would roughly correspond to A[C] being instrumentally efficient relative to C, and the second example (about potential bias) would correspond to A[C] being epistemically efficient relative to C.

Comment by esrogs on What are the open problems in Human Rationality? · 2019-01-14T19:58:14.963Z · score: 12 (6 votes) · LW · GW

Also superforecasting and GJP are no longer new. Seems not at all surprising that most of the words written about them would be from when they were.

Comment by esrogs on Strategic High Skill Immigration · 2019-01-14T19:51:07.559Z · score: 2 (1 votes) · LW · GW
Quan Xuesen

Qian, not Quan. Pronounced something like if you said "chee-ann" as one syllable.

Comment by esrogs on What are the open problems in Human Rationality? · 2019-01-14T05:49:53.043Z · score: 4 (3 votes) · LW · GW
What are genuinely confusing problems at the edge of the current rationality field – perhaps far away from the point where even specialists can implement them yet, but where we seem confused in a basic way about how the mind works, or how probability or decision theory work.

For one example of this, see Abram's most recent post, which begins: "So... what's the deal with counterfactuals?" :-)

Comment by esrogs on Towards formalizing universality · 2019-01-14T05:35:46.030Z · score: 5 (2 votes) · LW · GW
The only thing that makes it dominate C is the fact that C can do actual work that causes its beliefs to be accurate.

Was this meant to read, "The only thing that makes it hard to dominate C ...", or something like that? I don't quite understand the meaning as written.

Comment by esrogs on Towards formalizing universality · 2019-01-13T23:34:27.651Z · score: 6 (3 votes) · LW · GW

I think this ascription is meant to be pretty informal and general. So you could say for example that quicksort believes that 5 is less than 6.

I don't think there's meant to be any presumption that the inner workings of the algorithm are anything like a mind. That's my read from section I.2. Ascribing beliefs to arbitrary computations.

Comment by esrogs on What are the open problems in Human Rationality? · 2019-01-13T22:26:45.086Z · score: 2 (1 votes) · LW · GW

Interesting! Would you be willing to give a brief summary?

Comment by esrogs on What are the open problems in Human Rationality? · 2019-01-13T08:12:29.368Z · score: 16 (9 votes) · LW · GW

I don't have a crisp question yet, but one general area I'd be interested in understanding better is the interplay between inside views and outside views.

In some cases, having some outside view probability in mind can guide your search (e.g. "No, that can't be right because then such and such, and I have a prior that such and such is unlikely."), while in other cases, thinking too much about outside views seems like it can distract you from exploring underlying models (e.g. when people talk about AI timelines in a way that just seems to be about parroting and aggregating other people's timelines).

A related idea is the distinction between impressions and beliefs. In this view impressions are roughly inside views (what makes sense to you given the models and intuitions you have), while beliefs are what you'd bet on (taking into account the opinions of others).

I have some intuitions and heuristics about when it's helpful to focus on impressions vs beliefs. But I'd like to have better explicit models here, and I suspect there might be some interesting open questions in this area.

Henry Kissinger: AI Could Mean the End of Human History

2018-05-15T20:11:11.136Z · score: 46 (10 votes)

AskReddit: Hard Pills to Swallow

2018-05-14T11:20:37.470Z · score: 17 (6 votes)

Predicting Future Morality

2018-05-06T07:17:16.548Z · score: 22 (8 votes)

AI Safety via Debate

2018-05-05T02:11:25.655Z · score: 40 (9 votes)

FLI awards prize to Arkhipov’s relatives

2017-10-28T19:40:43.928Z · score: 12 (5 votes)

Functional Decision Theory: A New Theory of Instrumental Rationality

2017-10-20T08:09:25.645Z · score: 36 (13 votes)

A Software Agent Illustrating Some Features of an Illusionist Account of Consciousness

2017-10-17T07:42:28.822Z · score: 16 (3 votes)

Neuralink and the Brain’s Magical Future

2017-04-23T07:27:30.817Z · score: 6 (7 votes)

Request for help with economic analysis related to AI forecasting

2016-02-06T01:27:39.810Z · score: 6 (7 votes)

[Link] AlphaGo: Mastering the ancient game of Go with Machine Learning

2016-01-27T21:04:55.183Z · score: 14 (15 votes)

[LINK] Deep Learning Machine Teaches Itself Chess in 72 Hours

2015-09-14T19:38:11.447Z · score: 8 (9 votes)

[Link] First almost fully-formed human [foetus] brain grown in lab, researchers claim

2015-08-19T06:37:21.049Z · score: 7 (8 votes)

[Link] Neural networks trained on expert Go games have just made a major leap

2015-01-02T15:48:16.283Z · score: 15 (16 votes)

[LINK] Attention Schema Theory of Consciousness

2013-08-25T22:30:01.903Z · score: 3 (4 votes)

[LINK] Well-written article on the Future of Humanity Institute and Existential Risk

2013-03-02T12:36:39.402Z · score: 16 (19 votes)

The Center for Sustainable Nanotechnology

2013-02-26T06:55:18.542Z · score: 4 (11 votes)