Is a near-term, self-sustaining Mars colony impossible? 2020-06-03T22:43:08.501Z · score: 12 (7 votes)
ESRogs's Shortform 2020-04-29T08:03:28.820Z · score: 7 (1 votes)
Dominic Cummings: "we’re hiring data scientists, project managers, policy experts, assorted weirdos" 2020-01-03T00:33:09.994Z · score: 81 (21 votes)
'Longtermism' definitional discussion on EA Forum 2019-08-02T23:53:03.731Z · score: 17 (6 votes)
Henry Kissinger: AI Could Mean the End of Human History 2018-05-15T20:11:11.136Z · score: 46 (10 votes)
AskReddit: Hard Pills to Swallow 2018-05-14T11:20:37.470Z · score: 17 (6 votes)
Predicting Future Morality 2018-05-06T07:17:16.548Z · score: 25 (10 votes)
AI Safety via Debate 2018-05-05T02:11:25.655Z · score: 40 (9 votes)
FLI awards prize to Arkhipov’s relatives 2017-10-28T19:40:43.928Z · score: 12 (5 votes)
Functional Decision Theory: A New Theory of Instrumental Rationality 2017-10-20T08:09:25.645Z · score: 36 (13 votes)
A Software Agent Illustrating Some Features of an Illusionist Account of Consciousness 2017-10-17T07:42:28.822Z · score: 16 (3 votes)
Neuralink and the Brain’s Magical Future 2017-04-23T07:27:30.817Z · score: 6 (7 votes)
Request for help with economic analysis related to AI forecasting 2016-02-06T01:27:39.810Z · score: 6 (7 votes)
[Link] AlphaGo: Mastering the ancient game of Go with Machine Learning 2016-01-27T21:04:55.183Z · score: 14 (15 votes)
[LINK] Deep Learning Machine Teaches Itself Chess in 72 Hours 2015-09-14T19:38:11.447Z · score: 8 (9 votes)
[Link] First almost fully-formed human [foetus] brain grown in lab, researchers claim 2015-08-19T06:37:21.049Z · score: 7 (8 votes)
[Link] Neural networks trained on expert Go games have just made a major leap 2015-01-02T15:48:16.283Z · score: 15 (16 votes)
[LINK] Attention Schema Theory of Consciousness 2013-08-25T22:30:01.903Z · score: 3 (4 votes)
[LINK] Well-written article on the Future of Humanity Institute and Existential Risk 2013-03-02T12:36:39.402Z · score: 16 (19 votes)
The Center for Sustainable Nanotechnology 2013-02-26T06:55:18.542Z · score: 4 (11 votes)


Comment by esrogs on Learning the prior · 2020-07-05T23:24:36.589Z · score: 2 (1 votes) · LW · GW

In this case, I can pay humans to make forecasts for many randomly chosen x* in D*, train a model f to predict those forecasts, and then use f to make forecasts about the rest of D*.

The generalization is now coming entirely from human beliefs, not from the structural of the neural net — we are only applying neural nets to iid samples from D*.

Perhaps a dumb question, but don't we now have the same problem at one remove? The model for predicting what the human would predict would still come from a "strange" prior (based on the l2 norm, or whatever).

Does the strangeness just get washed out by the one layer of indirection? Would you ever want to do two (or more) steps, and train a model to predict what a human would predict a human would predict?

Comment by esrogs on Better priors as a safety problem · 2020-07-05T22:57:51.868Z · score: 2 (1 votes) · LW · GW

This roughly tracks what’s going on in our real beliefs, and why it seems absurd to us to infer that the world is a dream of a rational agent—why think that the agent will assign higher probability to the real world than the “right” prior? (The simulation argument is actually quite subtle, but I think that after all the dust clears this intuition is basically right.)

I didn't quite follow this bit. In particular, I'm not sure which of "real world" and "right prior" refers to an actual physical world, and which refers to a simulation or dream (or if that's even the right way to distinguish between the two).

I think this is saying something about having a prior over base-level universes or over simulated (or imagined) universes. And I think maybe it (and the surrounding context) is saying that it's more useful to have a prior that you're in a "real" universe (because otherwise you maybe don't care what happens). But I'm not confident of that interpretation.

Is that on the right track?

Comment by esrogs on ESRogs's Shortform · 2020-07-04T22:57:05.629Z · score: 3 (2 votes) · LW · GW

These were all personal connections / opportunity-arose situations.

The closest I've done to a systematic search was once asking someone who'd done a bunch of angel investments if there were any he'd invested in who were looking for more money and whom he was considering investing more in. That was actually my first angel investment (Pantelligent) and it ended up not working out. (But of course that's the median expected outcome.)

(The other two that I invested in that are not still going concerns were AgoraFund and AlphaSheets. Both of those were through personal connections as well.)

Comment by esrogs on Situating LessWrong in contemporary philosophy: An interview with Jon Livengood · 2020-07-03T20:19:40.152Z · score: 2 (1 votes) · LW · GW

Are you imagining them competing for two different pools of funding?

Comment by esrogs on The Puzzling Linearity of COVID-19 · 2020-07-02T07:49:49.980Z · score: 2 (1 votes) · LW · GW

His model sounds quite a bit like the dance portion of Tomas Pueyo's The Hammer and the Dance.

Comment by esrogs on Non offensive word for people who are not single-magisterium-Bayes thinkers · 2020-07-02T07:07:46.049Z · score: 2 (1 votes) · LW · GW

This great LW post uses the phrases Toolboxism and Single-Magisterium Bayes to describe the two ways of thinking

Was this meant to include a link to that post?

Comment by esrogs on The ground of optimization · 2020-07-02T05:01:51.526Z · score: 2 (1 votes) · LW · GW

Ah, good points!

Comment by esrogs on The ground of optimization · 2020-07-02T01:42:31.166Z · score: 2 (1 votes) · LW · GW

Hmm, I see what you're saying, but there still seems to be an analogy to me here with arbitrary utility functions, where you need the set of target states to be small (as you do say). Otherwise I could just say that the set of target states is all the directions the system might fly off in if you perturb it.

So you might say that, for this version of optimization to be meaningful, the set of target states has to be small (however that's quantified), and for the utility maximization version to be meaningful, you need the utility function to be simple (however that's quantified).

EDIT: And actually, maybe the two concepts are sort of dual to each other. If you have an agent with a simple utility function, then you could consider all its local optima to be a (small) set of target states for an optimizing system. And if you have an optimizing system with a small set of target states, then you could easily convert that into a simple utility function with a gradient towards those states.

And if your utility function isn't simple, maybe you wouldn't get a small set of target states when you do the conversion, and vice versa?

Comment by esrogs on The ground of optimization · 2020-07-01T07:43:50.258Z · score: 2 (1 votes) · LW · GW

I briefly defend it in a comment thread on this post though ( )

FYI: I think something got messed up with this link. The text of the link is a valid url, but it links to a mangled one (s.t. if you click it you get a 404 error).

Comment by esrogs on The ground of optimization · 2020-07-01T07:37:30.632Z · score: 2 (1 votes) · LW · GW

But if we use an ordering over states then we run into the following problem: how can we say whether a system is robust to perturbations? Is it just that the system continues to climb the preference gradient despite perturbations? But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system. So then we can say "well it should be an ordering over states with a compact representation" or "it should be more compact than competing explanations". This may be okay but it seems quite dicey to me.

Doesn't the set-of-target-states version have just the same issue (or an analogous one)?

For whatever behavior the system exhibits, I can always say that the states it ends up in were part of its set of target states. So you have to count on compactness (or naturalness of description, which is basically the same thing) of the set of target states for this concept of an optimizing system to be meaningful. No?

Comment by esrogs on The ground of optimization · 2020-06-30T23:59:21.883Z · score: 4 (2 votes) · LW · GW

Deep learning AGI implies mesa optimization: Since deep learning is so sample inefficient, it cannot reach human levels of performance if we apply deep learning directly to each possible task T. (For example, it has to relearn how the world works separately for each task T.) As a result, if we do get AGI primarily via deep learning, it must be that we used deep learning to create a new optimizing AI system, and that system was the AGI.

I don't quite understand what this is saying.

Suppose we train a giant deep learning model via self-supervised learning on a ton of real-world data (like GPT-N, but w/ other sensory modalities besides text), and then we build a second system designed to provide a nice interface to the giant model.

We'd give task specifications to the interface, and it would have some smarts about how to consult the model to figure out what to do. (The interface might also be learned, via reinforcement or supervised learning, or it might be hand-coded.)

It seems plausible to me that a system comprising these two pieces, the model and the interface, could be an AGI according to the definition here, in that when combined with a very wide variety of environments (including the task specification in the environment), it could perform at least as well as a human.

And since most of the smarts seem like they'd be in the model rather than the interface, I'd count it as getting AGI "primarily via deep learning", even if the interface was hand-coded.

But it's not clear to me whether that would count as using deep learning to "create a new optimizing AI system", which is itself the AGI. The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself, and it doesn't seem to have the flavor of mesa-optimization, as I understand it. So it seems like a contradiction to the quoted claim.

Have I misunderstood what you're saying here, or do you disagree with the characterization I gave of the hypothetical model + interface system? (Or have I perhaps misunderstood mesa-optimization?)

Comment by esrogs on Covid 6/25: The Dam Breaks · 2020-06-26T22:12:19.663Z · score: 4 (2 votes) · LW · GW

your claim is that "civilization" explains why the US handled Covid-19 so poorly

The claim is not that civilization itself is inadequate. It's that a particular civilization is inadequate.

the fact that other countries handled Covid-19 very differently constitutes evidence against the "civilizational inadequacy" hypothesis

The "civilizational inadequacy" hypothesis is not that civilization = bad. It's that a particular civilization is not living up to the standard of what you would expect from a well-functioning civilization.

Maybe it seems odd to describe different countries as different civilizations, but the fact that different countries have different outcomes seems very much in line with the "civilizational inadequacy" hypothesis, as I understand Zvi to be using the term.

Comment by esrogs on SlateStarCodex deleted because NYT wants to dox Scott · 2020-06-25T17:17:46.535Z · score: 4 (2 votes) · LW · GW

This LW thread is almost entirely about mistake theory.

This comment section is not what I was responding to. (There weren't many comments on this post when I made mine.) It was responses I'd seen in general across media, and yeah, a lot of that was on twitter. Apologies for ambiguous wording.

Comment by esrogs on SlateStarCodex deleted because NYT wants to dox Scott · 2020-06-23T23:00:56.637Z · score: 2 (1 votes) · LW · GW

Fair points.

Comment by esrogs on SlateStarCodex deleted because NYT wants to dox Scott · 2020-06-23T22:35:50.862Z · score: 4 (2 votes) · LW · GW

We are in a situation where the decision whether or not to publish Scott's name isn't yet made. As such it's important to build up pressure to affect that decision and it's not useful to be charitable.

I don't think it's so cut and dried as that. I think Scott's move to delete the blog was a reasonable one. But after that it's not clear to me whether all of us effectively saying "Fuck you!" to the NYT is more likely to result in them not publishing the name, or something more like, "Hey, I know you've got norms in favor of publishing real names, but I think you're making a mistake here, and hopefully the fact that Scott actually deleted his blog makes you realize he was more serious about this than you might have thought. I hope you make the right decision."

Like, maybe the latter won't work. But it's not obvious to me one way or the other. It seems like it depends on facts about the state of mind of various folks who work at the NYT that are hard for us to know.

EDIT: Or maybe a better way to put it is that being charitable might be part of how you "build up pressure to affect that decision". See Richard and Patrick's threads here. A charitable reading of what's happening from Metz's perspective might factor into your calculus of how to act to get the result you want.

Comment by esrogs on ESRogs's Shortform · 2020-06-23T19:38:12.407Z · score: 1 (4 votes) · LW · GW

Please participate in my poll for the correct spelling of the term that starts with a 'd' and rhymes with lox and socks:

Comment by esrogs on SlateStarCodex deleted because NYT wants to dox Scott · 2020-06-23T18:44:01.022Z · score: 52 (37 votes) · LW · GW

In general, responses I've seen so far to this have seemed to come more from a "conflict theory" (rather than "mistake theory") interpretation of what's going on. And perhaps too much so.

I thought these comments by ricraz were a good contribution to the discussion: 

Scott Alexander is the most politically charitable person I know. Him being driven off the internet is terrible. Separately, it is also terrible if we have totally failed to internalise his lessons, and immediately leap to the conclusion that the NYT is being evil or selfish.

Ours is a community *built around* the long-term value of telling the truth. Are we unable to imagine reasonable disagreement about when the benefits of revealing real names outweigh the harms? Yes, it goes against our norms, but different groups have different norms.

If the extended rationalist/SSC community could cancel the NYT, would we? For planning to doxx Scott? For actually doing so, as a dumb mistake? For doing so, but for principled reasons? Would we give those reasons fair hearing? From what I've seen so far, I suspect not.

I feel very sorry for Scott, and really hope the NYT doesn't doxx him or anyone else. But if you claim to be charitable and openminded, except when confronted by a test that affects your own community, then you're using those words as performative weapons, deliberately or not.

Comment by esrogs on [Site Meta] Feature Update: More Tags! (Experimental) · 2020-06-22T20:49:33.740Z · score: 4 (2 votes) · LW · GW

Just added the Efficient Market Hypothesis tag to that post.

Comment by esrogs on ESRogs's Shortform · 2020-06-22T01:45:25.586Z · score: 2 (1 votes) · LW · GW

Good writeup here explaining some details of the structure of the reverse IPO, and how that's affecting share, warrant, and option prices. H/T Wei Dai

Comment by esrogs on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-18T22:31:03.254Z · score: 2 (1 votes) · LW · GW

One is the difference between training time and deployment, as others have mentioned. But the other is that I'm skeptical that there will be a singleton AI that was just trained via reinforcement learning.

Like, we're going to train a single neural network end-to-end on running the world? And just hand over the economy to it? I don't think that's how it's going to go. There will be interlocking more-and-more powerful systems. See: Arguments about fast takeoff.

Comment by esrogs on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-18T20:39:37.093Z · score: 4 (2 votes) · LW · GW

I think AI systems should be designed in such a way to avoid being susceptible to sign flips (as Eliezer argues in that post you linked), but also suspect this is likely to happen naturally in the course of developing the systems. While a sign flip may occur in some local area, you'd have to have just no checksums on the process for the result of a sign-flipped reward function to end up in control.

Comment by esrogs on Mod Notice about Election Discussion · 2020-06-17T06:11:09.307Z · score: 4 (2 votes) · LW · GW

Ah, I see what you mean. This kind of discussion is not what comes to mind from the phrase "discuss politics", though. I think that was the source of confusion.

If the goal is to discuss abstract patterns that come up in politics (vs what I would think of as "discussing politics", namely discussions about politicians and policies and elections, etc), then I agree the non-loaded, made up examples are better.

Comment by esrogs on Open & Welcome Thread - June 2020 · 2020-06-17T00:57:11.864Z · score: 3 (2 votes) · LW · GW

Hmm, this seems a little different from Goodhart's law (or at least it's a particular special case that deserves its own name).

This concept, as I understand it, is not about picking the wrong metric to optimize. It's more like picking the wrong metric to satisfice, or putting the bar for satisficing in the wrong place.

Comment by esrogs on Open & Welcome Thread - June 2020 · 2020-06-17T00:50:38.751Z · score: 3 (2 votes) · LW · GW

Are you most concerned that:

1) they will believe false things (which is bad for its own sake)
2) they will do harm to others due to false beliefs
3) harm will come to them because of their false beliefs
4) they will become alienated from you because of your disagreements with each other
5) something else?

It seems like these different possibilities would suggest different mitigations. For example, if the threat model is that they just adopt the dominant ideology around them (which happens to be false on many points), then that results in them having false beliefs (#1), but may not cause any harm to come to them from it (#3) (and may even be to their benefit, in some ways).

Similarly, depending on whether you care more about #1 or #4, you may try harder to correct their false ideas, or to establish a norm for your relationship that it's fine to disagree with each other. (Though I suspect that, generally speaking, efforts that tend to produce a healthy relationship will also tend to produce true beliefs, in the long run.)

Comment by esrogs on Mod Notice about Election Discussion · 2020-06-17T00:05:08.545Z · score: 2 (1 votes) · LW · GW

Object-level harms to the discourse from using political examples. It's both harder for people to discuss politics, and harder for them to agree on the right abstractions. If you discuss the abstractions directly, you can avoid those issues.

I don't quite follow what this is trying to say. It's harder to talk about politics if you use political examples?

As a general rule, if you want to communicate clearly, it's better to give examples than to only use abstractions. I can understand an argument that it's undesirable to talk about politics except in very abstract terms, because it will tend to interfere with other discussions. But I'm confused by the apparent claim that even if you want to talk about politics itself, using examples is bad.

(If that even is what the quoted bit is trying to say. I'm having trouble parsing its sentences.)

Comment by esrogs on The Economic Consequences of Noise Traders · 2020-06-15T03:33:01.635Z · score: 6 (3 votes) · LW · GW

Worth noting that the paper is from 1987. (Though unclear whether the empirical results referenced in the abstract would have changed since then.)

Comment by esrogs on Why artificial optimism? · 2020-06-13T06:19:08.032Z · score: 2 (1 votes) · LW · GW

I would rather that conditions in the universe are good for the lifeforms

How do you measure this? What does it mean that conditions in the universe are good for the lifeforms other than that it gives them good experiences?

You're wanting to ground positive emotions in objectively good states. But I'm wanting to ground the goodness of states in the positive emotions they produce.

Perhaps there's some reflexivity here, where we both evaluate positive emotions based on how well they track reality, and we also evaluate reality on how much it produces positive emotions. But we need some way for it to bottom out.

For me, I would think positive emotions are more fundamentally good than universe states, so that seems like a safer place to ground the recursion. But I'm curious if you've got another view.

Comment by esrogs on Why artificial optimism? · 2020-06-13T04:12:42.839Z · score: 2 (1 votes) · LW · GW

I get the analogy. And I guess I'd agree that I value more complex positive emotions that are intertwined with the world more than sort of one note ones. (E.g. being on molly felt nice but kind of empty.)

But I don't think there's much intrinsic value in the world other than the experiences of sentient beings.

A cold and lifeless universe seems not that valuable. And if the universe has life I want those beings to be happy, all else equal. What do you want?

And regarding the evolutionary perspective, what do I care what's fit or not? My utility function is not inclusive genetic fitness.

Comment by esrogs on Turns Out Interruptions Are Bad, Who Knew? · 2020-06-12T18:59:54.037Z · score: 2 (1 votes) · LW · GW

This is a factor for me too.

Comment by esrogs on Why artificial optimism? · 2020-06-12T05:12:03.442Z · score: 2 (1 votes) · LW · GW

People often believe that it's inherently good to be happy, rather than thinking that their happiness level should track the actual state of affairs (and thus be a useful tool for emotional processing and communication). Why?

Isn't your happiness level one of the most important parts of the "actual state of affairs"? How would you measure the value of the actual state of affairs other than according to how it affects your (or others') happiness?

It seems to me that it is inherently good to be happy. All else equal, being happier is better.

That said, I agree that it's good to pay a cost in temporarily lower happiness (e.g. for emotional processing, etc) to achiever more happiness later. If that's all you mean -- that the optimal strategy allows for temporary unhappiness, and it's unwise to try to force yourself or others to be happy in all moments -- then I don't disagree.

Comment by esrogs on Turns Out Interruptions Are Bad, Who Knew? · 2020-06-12T00:38:34.334Z · score: 4 (2 votes) · LW · GW

The sweet spot so far for me was when I was working in a startup house (open office) with just a couple of other people in the room -- people with whom I was working closely. We'd spend most of the day working by ourselves, but would chat every now and then, usually to solve some particular problem we were working on.

It was just enough interaction to keep my social bar pretty full. While at the same time providing a minimum of distractions and interruptions.

Comment by esrogs on Turns Out Interruptions Are Bad, Who Knew? · 2020-06-12T00:32:55.523Z · score: 9 (5 votes) · LW · GW

Interesting to compare this to my own experience. When I'm by myself I often feel the draw of social media, which distracts me from work. But when I'm around other people (in an open office or otherwise), whom I could socialize with if I wanted to, then social media (and other internet distractions) are less of a draw, and I find it easier to focus.

So I agree that distractions are quite disruptive. But for me, being by myself is itself a source of distraction.

Comment by esrogs on ESRogs's Shortform · 2020-06-10T19:46:19.052Z · score: 4 (2 votes) · LW · GW


Comment by esrogs on ESRogs's Shortform · 2020-06-10T08:03:04.496Z · score: 6 (3 votes) · LW · GW

I was going to say that it's fine with me if my short call gets assigned and turns into a short position, but your comment on another thread about hard-to-borrow rates made me think I should look up the fees that my brokerage charges.

It looks like they're a lot. If I'm reading the table below correctly, IB is currently charging 0.4% per day to short NKLA, and it's been increasing.

Thanks for pointing this out!

> When the supply and demand attributes of a particular security are such that it becomes hard to borrow, the rebate provided by the lender will decline and may even result in a charge to the account. The rebate or charge will be passed on to the accountholder in the form of a higher borrow fee, which may exceed short sale proceeds interest credits and result in a net charge to the account. As rates vary by both security and date, IBKR recommends that customers utilize the Short Stock Availability tool accessible via the Support section in Client Portal/Account Management to view indicative rates for short sales.

Comment by esrogs on ESRogs's Shortform · 2020-06-10T02:02:38.692Z · score: 4 (2 votes) · LW · GW

Re: the options vs underlying, after chatting with a friend it seems like this might just be exactly what we'd expect if there is pent up demand to short, but shares aren't available -- there's an apparent arb available if you go long via options and short via the stock, but you can't actually execute the arb because shares aren't available to short. (And the SEC's uptick rule has been triggered.)

I'm thinking of taking advantage of the options prices via a short strangle (e.g. sell a long-dated $5 call and also sell a long-dated $105 put), but will want to think carefully about it because of the unbounded potential losses.

Comment by esrogs on ESRogs's Shortform · 2020-06-10T01:01:29.764Z · score: 8 (4 votes) · LW · GW

Some weird stuff happening with NKLA. That's the ticker for a startup called Nikola that did a reverse IPO last week (merging with the already-listed special-purpose company, VTIQ).

Nikola plans to sell various kinds of battery electric and hydrogen fuel cell trucks, with production scheduled to start in 2021.

When the reverse IPO was announced, the IPO price implied a valuation of NKLA at $3.3 billion. However, before the deal went through, the price of VTIQ rose from $10 in March to over $30 last week.

Then, after the combined company switched to the new ticker, NKLA, the price continued to rise, closing on Friday (June 5th) at $35, doubling on Monday to over $70 at close, and then continuing to rise to over $90 after hours, for a market cap over $30 billion, higher than the market cap of Ford.

The price has come down a bit today, and sits at $73 at the time I am writing this.

I have not investigated this company in detail. But some commentary from some amateur analysts whom I follow makes it sound to me like the hype has far outpaced the substance.

On Monday, I tried shorting at the open (via orders I'd placed the night before), but luckily for me, no shares were available to short (lucky since the price doubled that day). I tried again later in the day, and there were still no shares available.

It appears that the limited availability of shares to short has pushed traders into bidding up the prices of puts. If I'm reading the options chain right, it appears that a Jan 2022 synthetic long at a $50 strike (buying a $50 strike call and selling a $50 strike put) can be bought for roughly $0. Since the value of a synthetic long should be roughly equal to the price of the stock minus the strike, this implies a price of about $50 for the stock, in contrast to the $70+ price if you buy the stock directly.

That price discrepancy is so big that it seems like there's a significant chance I'm missing something. Can anybody explain why those options prices might actually make sense? Am I just doing the options math wrong? Is there some factor I'm not thinking of?

Comment by esrogs on Is a near-term, self-sustaining Mars colony impossible? · 2020-06-07T00:16:45.100Z · score: 2 (1 votes) · LW · GW

Not sure what Lincoln hand in mind regarding market forces, but one reason the cost to sustain the colony over time should shrink is just tech improvement. Operating the colony (at a given standard of living) should get cheaper over time.

Comment by esrogs on Is a near-term, self-sustaining Mars colony impossible? · 2020-06-04T20:53:22.242Z · score: 5 (3 votes) · LW · GW

And I’m just thinking about getting a Mars colony at all. I do think “self-sustaining” is a ridiculously high bar, much higher than simply having some people living on Mars.

This is one of my key takeaways so far from this (and from reading through Carl's three recent posts on the topic) -- that "colony" and "self-sustaining colony" are two very different goals.

Comment by esrogs on An overview of 11 proposals for building safe advanced AI · 2020-06-01T19:08:54.405Z · score: 2 (1 votes) · LW · GW

Got it, thank you!

Comment by esrogs on An overview of 11 proposals for building safe advanced AI · 2020-06-01T05:33:24.985Z · score: 13 (4 votes) · LW · GW

I see. So, restating in my own terms -- outer alignment is in fact about whether getting what you asked for is good, but for the case of prediction, the malign universal prior argument says that "perfect" prediction is actually malign. So this would be a case of getting what you wanted / asked for / optimized for, but that not being good. So it is an outer alignment failure.

Whereas an inner alignment failure would necessarily involve not hitting optimal performance at your objective. (Otherwise it would be an inner alignment success, and an outer alignment failure.)

Is that about right?

Comment by esrogs on An overview of 11 proposals for building safe advanced AI · 2020-06-01T02:22:01.702Z · score: 10 (5 votes) · LW · GW

Great post, thank you!

However, I think I don't quite understand the distinction between inner alignment and outer alignment, as they're being used here. In particular, why would the possible malignity of the universal prior be an example of outer alignment rather than inner?

I was thinking of outer alignment as being about whether, if a system achieves its objective, is that what you wanted. Whereas inner alignment was about whether it's secretly optimizing for something other than the stated objective in the first place.

From that perspective, wouldn't malignity in the universal prior be a classic example of inner misalignment? You wanted unbiased prediction, and if you got it that would be good (so it's outer-aligned). But it turns out you got something that looked like a predictor up to a point, and then turned out to be an optimizer (inner misalignment).

Have I misunderstood outer or inner alignment, or what malignity of the universal prior would mean?

Comment by esrogs on GPT-3: a disappointing paper · 2020-05-31T20:59:51.304Z · score: 4 (3 votes) · LW · GW

Assumption #2 is entirely correct

What makes you conclude this?

Comment by esrogs on ESRogs's Shortform · 2020-05-24T21:52:11.682Z · score: 4 (2 votes) · LW · GW


Comment by esrogs on Open & Welcome Thread - December 2019 · 2020-05-24T00:21:02.114Z · score: 2 (1 votes) · LW · GW


Comment by esrogs on Open & Welcome Thread - December 2019 · 2020-05-23T20:57:24.568Z · score: 2 (1 votes) · LW · GW


wondering if the community here thought Hume was an idiot

Just searched old posts, and apparently at least one person on LW thought Hume was a candidate for the Greatest Philosopher in History. That's an obscure post with only one upvote though, so can't be considered representative of the community's views.

In general I think this community tends to be not too concerned with evaluating long-dead philosophers, and instead prefers to figure out what we can, informed by all the knowledge we currently have available from across scientific disciplines.

Historical philosophers may have been bright and made good arguments in their time. But they were starting from a huge disadvantage to us, if they didn't have access to a modern understanding of evolution, cognitive biases, logic and computability, etc.

For a fairly representative account of how LW-ers view mainstream philosophy, see: Less Wrong Rationality and Mainstream Philosophy and Philosophy: A Diseased Discipline.

wondering if the community here thought... the latest findings about emotions being a necessary part of decision-making horrifying

I'm not sure exactly what you're referring to. But in general I think the community is pretty on-board with thinking that there's a lot that our brains do besides explicit verbal deductive reasoning, and that this is useful.

And also that you'll reason best if you can set up a sort of dialogue between your emotional, intuitive judgments and your explicit verbal reasoning. Each can serve as a check on the other. Neither is to be completely trusted. And you'll do best when you can make use of both. (See Kahneman's work on System 1 and System 2 thinking.)

Comment by esrogs on ESRogs's Shortform · 2020-05-23T20:38:24.463Z · score: 6 (3 votes) · LW · GW

I'm looking for an old post where Eliezer makes the basic point that we should be able to do better than intellectual figures of the past, because we have the "unfair" advantage of knowing all the scientific results that have been discovered since then.

I think he cites in particular the heuristics and biases literature as something that thinkers wouldn't have known about 100 years ago.

I don't remember if this was the main point of the post it was in, or just an aside, but I'm pretty confident he made a point like this at least once, and in particular commented on how the advantage we have is "unfair" or something like that, so that we shouldn't feel at all sheepish about declaring old thinkers wrong.

Anybody know what post I'm thinking of?

Comment by esrogs on A Problem With Patternism · 2020-05-22T21:53:21.960Z · score: 2 (1 votes) · LW · GW
"Which are relevant, and which are most important?"
That’s precisely the subjective part.

They could be objective, given a context. Now the choice of context may be a matter of taste or preference. But given a context that we want to ask questions about, we might be able to get objective answers. (E.g. will this hypothetical future person think like me?)

But agree that some subjectivity is involved somewhere in the process.

Comment by esrogs on Reflective Complaints · 2020-05-22T01:03:53.427Z · score: 4 (4 votes) · LW · GW

Romeo Stevens, the king of pith.

Comment by esrogs on A Problem With Patternism · 2020-05-22T01:00:14.984Z · score: 4 (2 votes) · LW · GW

Ah, maybe I misunderstood what you meant when you said you would throw it away. I thought maybe you meant you'd discard it in favor of some other preferred theory. Or in favor of whatever you believed in before you learned about patternism.

And depending on what those theories are, that seemed like it might be a bad move, from my perspective.

But if instead your attitude is more like picking up a book, only to find out the author only got half way through writing it, and you're going to set it aside until it's done so you can read the whole story, then it seems to me like there's nothing wrong with that.

Comment by esrogs on A Problem With Patternism · 2020-05-22T00:54:10.059Z · score: 3 (2 votes) · LW · GW
but for now is the superiority of subjective measuring the viewpoint I’ll accept

I didn't follow this. You're saying for now you're leaning towards a subjective measuring viewpoint? Which one?

I’m willing to give up on trying to find some impartial way of measuring this

Depending on what you mean by "impartial", I might agree that that's the right move. But I think a good theory might end up looking more like special relativity, where time, speed, and simultaneity are observer-dependent (rather than universal), but in a well-defined way that we can speak precisely about.

I assume personal identity will be a little more complicated than that, since minds are more complicated than beams of light. But just wanted to highlight that as an example where we went from believing in a universal to one that was relative, but didn't have to totally throw up our hands and declare it all meaningless.

I’m at a loss to how you could build on it honestly.

FWIW, if I were to spend some time on it, I'd maybe start by thinking through all the different ways that we use personal identity. Like, how the concept interacts with things. For example, partly it's about what I anticipate experiencing next. Partly it's about which beings' future experiences I value. Partly it's about how similar that entity is to me. Partly it's about how much I can control what happens to that future entity. Partly it's about what that entities memories will be. Etc, etc.

Just keep making the list, and then analyze various scenarios and thought experiments and think through how each of the different forms of personal identity applies. Which are relevant, and which are most important?

Then maybe you have a big long list of attributes of identity, and a big long list of how decision-relevant they are for various scenarios. And then maybe you can do dimensionality reduction and cluster them into a few meaningful categories that are individual amenable to quantification (similar to how the Big 5 personality system was developed).

That doesn't sound so hard, does it? ;-)