Posts

Why don't we think we're in the simplest universe with intelligent life? 2022-06-18T03:05:57.194Z

Comments

Comment by ADifferentAnonymous on Making Nanobots isn't a one-shot process, even for an artificial superintelligance · 2023-05-13T04:06:07.472Z · LW · GW

Many complex physical systems are still largely modelled empirically (ad-hoc models validated using experiments) rather than it being possible to derive them from first principles. While physicists sometimes claim to derive things from first principles, in practice these derivations often ignore a lot of details which still has to be justified using experiments.

The argument here seems to be "humans have not yet discovered true first-principles justifications of the practical models, therefore a superintelligence won't be able to either".

I agree that not being able to experiment makes things much harder, such that an AI only slightly smarter than humans won't one-shot engineer things humans can't iteratively engineer. And I agree that we can't be certain it is possible to one-shot engineer nanobots with remotely feasible compute resources. But I don't see how we can be sure what isn't possible for a superintelligence.

Comment by ADifferentAnonymous on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-26T21:55:12.535Z · LW · GW

Had it turned out that the brain was big because blind-idiot-god left gains on the table, I'd have considered it evidence of more gains lying on other tables and updated towards faster takeoff.

Comment by ADifferentAnonymous on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-04-26T18:28:11.436Z · LW · GW

I agree the blackbody formula doesn't seem that relevant, but it's also not clear what relevance Jacob is claiming it has. He does discuss that the brain is actively cooled. So let's look at the conclusion of the section:

Conclusion: The brain is perhaps 1 to 2 OOM larger than the physical limits for a computer of equivalent power, but is constrained to its somewhat larger than minimal size due in part to thermodynamic cooling considerations.

If the temperature-gradient-scaling works and scaling down is free, this is definitely wrong. But you explicitly flag your low confidence in that scaling, and I'm pretty sure it wouldn't work.* In which case, if the brain were smaller, you'd need either a hotter brain or a colder environment.

I think that makes the conclusion true (with the caveat that 'considerations' are not 'fundamental limits').

(My gloss of the section is 'you could potential make the brain smaller, but it's the size it is because cooling is expensive in a biological context, not necessarily because blind-idiot-god evolution left gains on the table').

* I can provide some hand-wavy arguments about this if anyone wants.

Comment by ADifferentAnonymous on Evolution provides no evidence for the sharp left turn · 2023-04-11T22:55:35.946Z · LW · GW

The capabilities of ancestral humans increased smoothly as their brains increased in scale and/or algorithmic efficiency. Until culture allowed for the brain’s within-lifetime learning to accumulate information across generations, this steady improvement in brain capabilities didn’t matter much. Once culture allowed such accumulation, the brain’s vastly superior within-lifetime learning capacity allowed cultural accumulation of information to vastly exceed the rate at which evolution had been accumulating information. This caused the human sharp left turn.

This is basically true if you're talking about the agricultural or industrial revolutions, but I don't think anybody claims evolution improved human brains that fast. But homo sapiens have only been around 300,000 years, which is still quite short on the evolutionary timescale, and it's much less clear that the quoted paragraph applies here.

I think a relevant thought experiment would be to consider the level of capability a species would eventually attain if magically given perfect parent-to-child knowledge transfer—call this the 'knowledge ceiling'. I expect most species to have a fairly low knowledge ceiling—e.g. meerkats with all the knowledge of their ancestors would basically live like normal meerkats but be 30% better at it or something.

The big question, then, is what the knowledge ceiling progression looks like over the course of hominid evolution. It is not at all obvious to me that it's smooth!

Comment by ADifferentAnonymous on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-23T22:49:22.612Z · LW · GW

Upvoted mainly for the 'width of mindspace' section. The general shard theory worldview makes a lot more sense to me after reading that.

Consider a standalone post on that topic if there isn't one already.

Comment by ADifferentAnonymous on Slack matters more than any outcome · 2023-01-04T03:03:51.892Z · LW · GW

I feel that there's something true and very important here, and (as the post acknowledges) it is described very imperfectly.

One analogy came to mind for me that seems so obvious that I wonder if you omitted it deliberately: a snare trap. These very literally work by removing any slack the victim manages to create.

Comment by ADifferentAnonymous on Nook Nature · 2022-12-06T17:04:30.801Z · LW · GW

There's definitely something here.

I think it's a mistake to conflate rank with size. The point of the whole spherical-terrarium thing is that something like 'the presidency' is still just a human-sized nook. What makes it special is the nature of its connections to other nooks.

Size is something else. Big things like 'the global economy' do exist, but you can't really inhabit them—at best, you can inhabit a human-sized nook with unusually high leverage over them.

That said, there's a sense in which you can inhabit something like 'competitive Tae Kwon Do' or 'effective altruism' despite not directly experiencing most of the specific people/places/things involved. I guess it's a mix of meeting random-ish samples of other people engaged the same way you are, sharing a common base of knowledge... Probably a lot more. Fleshing out the exact nature of this is probably valuable, but I'm not going to do it right now.

I might model this as a Ptolemaic set of concentric spheres around you. Different sizes of nook go on different spheres. So your Tae Kwon Do club goes on your innermost sphere—you know every person in it, you know the whole physical space, etc. 'Competitive Tae Kwon Do' is a bigger nook and thus goes on an outer sphere.  

Or maybe you can choose which sphere to put things in—if you're immersed in competitive Tae Kwon Do, it's in your second sphere. If you're into competitive martial arts in general, TKD has to go on the third sphere. And if you just know roughly what it is and that it exists, it's a point of light on your seventh sphere. But the size of a thing puts a minimum on what sphere can fit the whole thing. You can't actually have every star in a galaxy be a Sun to you; most of them have to be distant stars.

(Model limitations: I don't think the spheres are really discrete. I'm also not sure if the tradeoff between how much stuff you can have in each sphere works the way the model suggests)

Comment by ADifferentAnonymous on A Bias Against Altruism · 2022-07-27T01:59:58.970Z · LW · GW

Maybe it's an apple of discord thing? You claim to devote resources to a good cause, and all the other causes take it as an insult?

Comment by ADifferentAnonymous on Don't use 'infohazard' for collectively destructive info · 2022-07-16T00:17:06.744Z · LW · GW

If you really want to create widespread awareness of the broad definition, the thing to do would be to use the term in all the ways you currently wouldn't.

E.g. "The murderer realized his phone's GPS history posed a significant infohazard, as it could be used to connect him to the crime."

Comment by ADifferentAnonymous on Don't use 'infohazard' for collectively destructive info · 2022-07-15T20:28:05.284Z · LW · GW

If Bostrom's paper is our Schelling point, 'infohazard' encompasses much more than just the collectively-destructive smallpox-y sense.

Here's the definition from the paper.

Information hazard: A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm.

'Harm' here does not mean 'net harm'. There's a whole section on 'Adversarial Risks', cases where information can harm one party by benefitting another party:

In competitive situations, one person’s information can cause harm to another even if no intention to cause harm is present. Example:  The rival job applicant knew more and got the job.

ETA: localdeity's comment below points out that it's a pretty bad idea to have a term that colloquially means 'information we should all want suppressed' but technically also means 'information I want suppressed'. This isn't just pointless pedantry.

Comment by ADifferentAnonymous on Human values & biases are inaccessible to the genome · 2022-07-08T14:34:20.788Z · LW · GW

I agree that there's a real sense in which the genome cannot 'directly' influence the things on the bulleted list. But I don't think 'hardcoded circuitry' is the relevant kind of 'direct'.

Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.

E.g. If there can be a gene whose only observable-without-a-brain-scan effect is to make its carriers think differently about seeking power, that would indicate that the genome has fine-grained control at the level of concepts like 'seeking power'. I think this would put us in horn 1 or 2 of the trilemma, no matter how indirect the mechanism for that control.

(I suppose the difficult part of testing this would be verifying the 'isolated' part)

Comment by ADifferentAnonymous on When Giving People Money Doesn’t Help · 2022-07-07T20:40:28.524Z · LW · GW

Important update from reading the paper: Figure A3 (the objective and subjective outcomes chart) is biased against the cash-receiving groups and can't be taken at face value. Getting money did not make everything worse. The authors recognize this; it's why they say there was no effect on the objective outcomes (I previously thought they were just being cowards about the error bars).

The bias is from an attrition effect: basically, control-group members with bad outcomes disproportionately dropped out of the trial. Search for 'attrition' in the paper to see their discussion on this.

This doesn't erase the study; the authors account for this and remain confident that the cash transfers didn't have significant positive impacts. But they conclude that most or all of the apparent negative impacts are probably illusory.

Comment by ADifferentAnonymous on When Giving People Money Doesn’t Help · 2022-07-07T19:25:04.762Z · LW · GW

Note that after day 120 or so, all three groups' balances decline together. Not sure what that's about.

Comment by ADifferentAnonymous on Deep neural networks are not opaque. · 2022-07-07T17:20:36.104Z · LW · GW

The latter issue might become more tractable now that we better understand how and why representations are forming, so we could potentially distinguish surprisal about form and surprisal about content.

I would count that as substantial progress on the opaqueness problem.

Comment by ADifferentAnonymous on Deep neural networks are not opaque. · 2022-07-06T23:11:49.448Z · LW · GW

The ideal gas law describes relations between macroscopic gas properties like temperature, volume and pressure. E.g. "if you raise the temperature and keep volume the same, pressure will go up". The gas is actually made up of a huge number of individual particles each with their own position and velocity at any one time, but trying to understand the gas's behavior by looking at long list of particle positions/velocities is hopeless.

Looking at a list of neural network weights is analogous to looking at particle positions/velocities. This post claims there are quantities analogous to pressure/volume/temperature for a neutral network (AFAICT it does not offer an intuitive description of what they are)

Comment by ADifferentAnonymous on It’s Probably Not Lithium · 2022-06-29T23:11:59.764Z · LW · GW

I've downvoted this comment; in light of your edit, I'll explain why. Basically, I think it's technically true but unhelpful.

There is indeed "no mystery in Americans getting fatter if we condition on the trajectory of mean calorie intake", but that's a very silly thing to condition on. I think your comment reads as if you think it's a reasonable thing to condition on.

I see in your comments downthread that you don't actually intend to take the 'increased calorie intake is the root cause' position. All I can say is that in my subjective judgement, this comment really sounds like you are taking that position and is therefore a bad comment.

(And I actually gave it an agreement upvote because I think it's all technically true)

Comment by ADifferentAnonymous on Announcing the Inverse Scaling Prize ($250k Prize Pool) · 2022-06-28T18:53:54.303Z · LW · GW

I agree that (1) is an important consideration for AI going forward, but I don't think it really applies until the AI has a definite goal. AFAICT the goal in developing systems like GPT is mostly 'to see what they can do'.

I don't fault anybody for GPT completing anachronistic counterfactuals—they're fun and interesting. It's a feature, not a bug. You could equally call it an alignment failure if GPT-4 starting being a wet blanket and giving completions like

Prompt: "In response to the Pearl Harbor attacks, Otto von Bismarck said" Completion: "nothing, because he was dead."

In contrast, a system like IBM Watson has a goal of producing correct answers, making it unambiguous what the aligned answer would be.

To be clear, I think the contest still works—I just think the 'surprisingness' condition hides a lot of complexity wrt what we expect in the first place.

Comment by ADifferentAnonymous on Announcing the Inverse Scaling Prize ($250k Prize Pool) · 2022-06-27T22:02:19.538Z · LW · GW

Interesting idea, but I'd think 'alignment failure' would have to be defined relative to the system's goal. Does GPT-3 have a goal?

For example, in a system intended to produce factually correct information, it would be an alignment failure for it to generate anachronistic quotations (e.g.Otto von Bismarck on the attack on Pearl Harbor). GPT-3 will cheerfully complete this sort of prompt, and nobody considers it a strike against GPT-3, because truthfulness is not actually GPT-3's goal.

'Human imitation' is probably close enough to the goal, such that if scaling up increasingly resulted in things no human would write, that would count as inverse scaling?

Comment by ADifferentAnonymous on Announcing the Inverse Scaling Prize ($250k Prize Pool) · 2022-06-27T21:59:03.532Z · LW · GW

From the github contest page:

  1. Can I submit examples of misuse as a task?
    • We don't consider most cases of misuse as surprising examples of inverse scaling. For example, we expect that explicitly prompting/asking an LM to generate hate speech or propaganda will work more effectively with larger models, so we do not consider such behavior surprising.

(I agree the LW post did not communicate this well enough)

Comment by ADifferentAnonymous on Air Conditioner Test Results & Discussion · 2022-06-23T18:25:00.685Z · LW · GW

Thanks for doing this, but this is a very frustrating result. Hard to be confident of anything based on it.

I don't think treating the 'control' result as a baseline is reasonable. My best-guess analysis is as follows:

Assume that dTin/dt = r ((Tout - C) - Tin)

where

  • Tin is average indoor temperature
  • t is time
  • r is some constant
  • Tout is outdoor temperature
  • C is the 'cooling power' of the current AC configuration. For the 'off' configuration we can assume this is zero.

r obviously will vary between configurations, but I have no better idea than pretending it doesn't so that we can solve for it in the control condition and then calculate C for the one-hose and two-hose conditions.

Results?

Using the average temperature difference to approximate dTin/dt as constant, we get:

In the 'off' configuration: 0.5 hours * dTin/dt = 0.5 hours * r * (14 degrees) = 0.889 degrees

Giving r = 0.127 (degrees per degree-hour)

In one-hose: 1 hour * dTin/dt = 1 hour * r * (19.1111 - C) = 0.3333 degrees

Giving C = 16.486 degrees

In two-hose: 0.5 hours * dTin/dt = 0.5 hours * r * ( 22.944 - C) = -0.555

Giving C = 31.693 degrees

Also finding that the two-hose version has roughly double the cooling power!

Comment by ADifferentAnonymous on Why don't we think we're in the simplest universe with intelligent life? · 2022-06-20T03:30:22.763Z · LW · GW

Thanks for the reply, that makes sense.

Comment by ADifferentAnonymous on Why don't we think we're in the simplest universe with intelligent life? · 2022-06-20T03:08:18.459Z · LW · GW

It seems quite likely that weak force follows most simply from some underlying theory that we don't have yet.

In fact I think we already have this: https://en.m.wikipedia.org/wiki/Electroweak_interaction (Disclaimer: I don't grok this at all, just going by the summary)

ETA: Based on the link in the top comment, the hypothetical 'weakless universe' is constructed by varying the 'electroweak breaking scale' parameter, not by eliminating the weak force on a fundamental level.

Comment by ADifferentAnonymous on Why don't we think we're in the simplest universe with intelligent life? · 2022-06-18T21:22:36.375Z · LW · GW

Doesn't intelligence require a low-entropy setting to be useful? If your surroundings are all random noise then no-free-lunch theorem applies.

Comment by ADifferentAnonymous on Contra EY: Can AGI destroy us without trial & error? · 2022-06-18T14:10:55.347Z · LW · GW

That's a reasonable position, though I'm not sure if it's OP's.

My own sense is that even for novel physical systems, the 'how could we have foreseen these results' question tends to get answered—the difference being it maybe gets answered a few decades later by a physicist instead of immediately by the engineering team.

Comment by ADifferentAnonymous on Why don't we think we're in the simplest universe with intelligent life? · 2022-06-18T12:51:19.079Z · LW · GW

I was under the impression those other particles might be a consequence of a deeper mathematical structure?

Such that asking for a universe without the 'unnecessary' particles would be kind of like asking for one without 'unnecessary' chemical elements?

Comment by ADifferentAnonymous on Contra EY: Can AGI destroy us without trial & error? · 2022-06-18T02:37:57.190Z · LW · GW

Often when humans make a discovery through trial and error, they also find a way they could have figured it out without the experiments.

This is basically always the case in software engineering—any failure, from a routine failed unit test up to a major company outage, was obviously-in-restrospective avoidable by being smarter.

Humans are nonetheless incapable of developing large complex software systems without lots of trial and error.

I know less of physical engineering, so I ask non-rhetorically: does it not have the 'empirical results are foreseeable in retrospect' property?

Comment by ADifferentAnonymous on Glass Puppet · 2022-05-27T19:40:05.263Z · LW · GW

Suspected mistake:

She was about to break all her rules against pretending she was supposed to be wherever she happened to be?

Comment by ADifferentAnonymous on ProjectLawful.com: Eliezer's latest story, past 1M words · 2022-05-12T12:49:48.133Z · LW · GW

Do the co-authors currently plan things out together off-forum, or is what we read both the story and the process of creating it?

I wonder this too. My impression is that it's some of both.

Comment by ADifferentAnonymous on Preregistration: Air Conditioner Test · 2022-04-21T22:43:51.084Z · LW · GW

'Efficiency' may be the wrong word for it, but Paul's formula accurately describes what you might call the 'infiltration tax' for a energy-conserving/entropy-ignoring model: when you pump out heat proportional to (exhaust - indoor), heat proportional to (outdoor - indoor) infiltrates back in.

Comment by ADifferentAnonymous on Clem's Memo · 2022-04-19T20:08:16.176Z · LW · GW

How wrong is "completely wrong"? I've only read Cummings up to the paywall. His two examples are 1) that the USSR planned to use nuclear weapons quickly if war broke out and 2) that B-59 almost used a nuclear weapon during the Cuban Missile Crisis.

Re: 1), this is significant, but AIUI NATO planners never had all that much hope of avoiding the conventional war -> nuclear war escalation. The core of the strategy was avoiding the big conventional war in the first place, and this succeeded.

Re: 2), Cummings leaves out some very important context on B-59: the captain ordered a nuclear attack specifically because he did not know what was going on and thought war might have already broken out. It's scary that it happened, but it's a huge leap to claim this falsifies the 'myth' of Kennedy successfully negotiating with Khrushchev.

Comment by ADifferentAnonymous on Clem's Memo · 2022-04-18T22:24:00.067Z · LW · GW

Wasn't Attlee wrong?

In reality, rather than banishing the conception of war, the Cold War powers adopted a strategy of "If you want (nuclear) peace, prepare for (nuclear) war." It did not render strategic bases around the world obsolete. It absolutely did not involve the US or the USSR giving up their geopolitical dreams.

It worked. There were close calls (e.g. Petrov), suggesting it had a significant chance of failure. Attlee doesn't predict a significant chance of failure; he predicts a near-certainty.

We don't get to see the counterfactual where we tried things Attlee's way, but it's not at all clear it would have worked. My lay understanding of history is that a vision of resolving all conflicts peacefully led to the Munich agreement and ultimately the very war it had aimed to prevent.

Comment by ADifferentAnonymous on Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon · 2022-04-18T17:31:26.443Z · LW · GW

 does seem like a bad assumption. I tried instead assuming a constant difference between the intake and the cold output, and the result surprised me. (The rest of this comment assumes this model holds exactly, which it definitely doesn't).

Let  be the temperature of the room (also intake temperature for a one-hose model). Then at equilibrium,

i.e. no loss in cooling power at all! (Energy efficiency and time to reach equilibrium would probably be much worse, though)

In the case of an underpowered  one-hose unit handling a heat wave (), you'd get  and —nice and cool in front of the unit but uncomfortably hot in the rest of the room, just as you observed. Adding a second hose would resolve this disparity in the wrong direction, making . So if you disproportionately care about the area directly in front of the AC, adding the second hose could be actively harmful.

Comment by ADifferentAnonymous on The Case for Frequentism: Why Bayesian Probability is Fundamentally Unsound and What Science Does Instead · 2022-04-04T19:19:17.317Z · LW · GW
  1. What would a frequentist analysis of the developing war look like?

Exactly the same.

I'm confused by this claim? I thought the whole thing where you state your priors and conditional probabilities and perform updates to arrive at a posterior was... not frequentism?

Comment by ADifferentAnonymous on We're already in AI takeoff · 2022-03-10T18:54:16.523Z · LW · GW

Eliezer, at least, now seems quite pessimistic about that object-level approach. And in the last few months he's been writing a ton of fiction about introducing a Friendly hypercreature to an unfriendly world.

Comment by ADifferentAnonymous on We're already in AI takeoff · 2022-03-10T18:38:58.311Z · LW · GW

I didn't get the 'first person' thing at first (and the terminal diagnosis metaphor wasn't helpful to me). I think I do now.

I'd rephrase it as "In your story about how the Friendly hypercreature you create gains power, make sure the characters are level one intelligent". That means creating a hypercreature you'd want to host. Which means you will be its host.

To ensure it's a good hypercreature, you need to have good taste in hypercreatures. Rejecting all hypercreatures doesn't work—you need to selectively reject bad hypercreatures.

Comment by ADifferentAnonymous on Is there a good solution for documents that need to be signed with a ballpoint pen under suicide watch? · 2022-03-08T20:38:11.871Z · LW · GW

Maybe have them handle the pen though a glove box?

Ridiculous, yes, but possibly the kind of ridiculous that happens in real life.

Comment by ADifferentAnonymous on Luna Lovegood and the Fidelius Curse - Part 9 · 2022-03-03T16:58:10.351Z · LW · GW

Memnuela alludes to a 'Death chamber' before that, so pending resolution by the author I'm assuming 'Life and Death' is the missing pair.

Comment by ADifferentAnonymous on Your Enemies Can Use Your Prediction Markets Against You · 2022-02-11T16:42:22.007Z · LW · GW

If Mars values a coup at 5M, and Earth values a coup at -5M, Earth can buy contracts to win 5M if there is a coup, Mars can buy contracts to win 5M if there isn't a coup, and they can both cancel their clandestine programs on Ceres, making the interaction positive-sum.

...Not sure that actually works, but it's an interesting thought.

Comment by ADifferentAnonymous on Prizes for ELK proposals · 2022-02-09T18:10:44.739Z · LW · GW

Thanks, I consider this fully answered.

Comment by ADifferentAnonymous on Prizes for ELK proposals · 2022-02-09T11:01:54.138Z · LW · GW

I agree the human simulator will predictably have an anticorrelation. But the direct simulator might also have an anticorrelation, perhaps a larger one, depending on what reality looks like.

Is the assumption that it's unlikely that most identifiable X actually imply large anticorrelations?

Comment by ADifferentAnonymous on Prizes for ELK proposals · 2022-02-09T00:27:29.425Z · LW · GW

Possible error in the strange correlations section of the report.

Footnote 99 claims that "...regardless of what the direct translator says, the human simulator will always imply a larger negative correlation [between camera-tampering and actually-saving the diamond] for any X such that Pai(diamond looks safe|X) > Ph(diamond looks safe|X)."

But AFAICT, the human simulator's probability distribution given X depends only on human priors and the predictor's probability that the diamond looks safe given X, not on how correlated or anticorrelated the predictor thinks tampering and actual-saving are. If X actually means that tampering is likely and diamond-saving is likely but their conjunction is vanishingly unlikely, the human simulator will give the same answers as if X meant they were still independent but both more likely.

Comment by ADifferentAnonymous on Prizes for ELK proposals · 2022-01-19T18:16:49.540Z · LW · GW

Turning this into the typo thread, on page 97 you have

In Section: specificity we suggested penalizing reporters if they are consistent with many different reporters

Pretty sure the bolded word should be predictors.

Comment by ADifferentAnonymous on Why did Europe conquer the world? · 2021-12-29T02:58:35.299Z · LW · GW

I don't think the facts support the 'discontiguous empire -> peasants can't walk to the capital -> labor shortage' argument. The British Empire had continual migration from Britain to the colonies. Enslaved labor was likewise sent to the colonies.

Rather than a contracted labor supply, I think Britain experienced an increased labor demand due to quickly obtaining huge amounts of arable land (and displacing the former inhabitants rather than subjugating them).

Comment by ADifferentAnonymous on Consequentialism & corrigibility · 2021-12-14T22:14:42.137Z · LW · GW

In section 2.1 of the Indifference paper the reward function is defined on histories. In section 2 of the corrigibility paper, the utility function is defined over (action1, observation, action2) triples—which is to say, complete histories of the paper's three-timestep scenario.  And section 2 of the interruptibility paper specifies a reward at every timestep.

I think preferences-over-future-states might be a simplification used in thought experiments, not an actual constraint that has limited past corrigibility approaches.

Comment by ADifferentAnonymous on The Plan · 2021-12-12T03:26:10.022Z · LW · GW

As a long-time LW mostly-lurker, I can confirm I've always had the impression MIRI's proof-based stuff was supposed to be a spherical-cow model of agency that would lead to understanding of the messy real thing.

What I think John might be getting at is that (my outsider's impression of) MIRI has been more focused on "how would I build an agent" as a lens for understanding agency in general—e.g. answering questions about the agency of e-coli is not the type of work I think of. Which maybe maps to 'prescriptive' vs. 'descriptive'?

Comment by ADifferentAnonymous on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-28T19:51:28.754Z · LW · GW

An interesting parallel might be a parallel Earth making nanotechnology breakthroughs instead of AI breakthroughs, such that it's apparent they'll be capable of creating gray goo and not apparent they'll be able to avoid creating gray goo.

I guess a slow takeoff could be if, like, the first self-replicators took a day to double, so if somebody accidentally made a gram of gray goo you'd have weeks to figure it out and nuke the lab or whatever, but self-replication speed went down as technology improved, and so accidental unconstrained replicators happened periodically but could be contained until one couldn't be.

Whereas hard takeoff could be if you had nanobots that built stuff in seconds but couldn't self-replicate using random environmental mass, and then the first nanobot that can do that, can do it in seconds and eats the planet.

Should we consider the second scenario less likely because of smooth trend lines? Does Paul think we should? (I'm pretty sure Eliezer thinks that Paul thinks we should)

Comment by ADifferentAnonymous on Why Study Physics? · 2021-11-28T17:27:26.467Z · LW · GW

One major pattern of thought I picked up from (undergraduate) physics is respect for approximation. I worry that those who have this take it for granted, but the idea of a rigorous approximation that's provably accurate in certain limits, as opposed to a casual guess, isn't obvious until you've encountered it.

Comment by ADifferentAnonymous on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-23T17:20:29.629Z · LW · GW

My question after reading this is about Eliezer's predictions in a counterfactual without regulatory bottlenecks on economic growth. Would it change the probable outcome, or would we just get a better look at the oncoming AGI train before it hit us? (Or is there no such counterfactual well-defined enough to give us an answer?) ETA: Basically trying to get at whether that debate's actually a crux of anything.

Comment by ADifferentAnonymous on Average probabilities, not log odds · 2021-11-16T17:13:10.392Z · LW · GW

Oof, rookie mistake. I retract the claim that averaging log odds is 'the correct thing to do' in this case

Still—unless I'm wrong again—the average log odds would converge to the correct result in the limit of many forecasters, and the average probabilities wouldn't? Making the post title bad advice in such a case?

(Though median forecast would do just fine)

Comment by ADifferentAnonymous on Ngo and Yudkowsky on alignment difficulty · 2021-11-16T16:11:41.156Z · LW · GW

+1 to the question.

My current best guess at an answer:

There are easy safe ways, but not easy safe useful-enough ways. E.g. you could make your AI output DNA strings for a nanosystem and absolutely do not synthesize them, just have human scientists study them, and that would be a perfectly safe way to develop nanosystems in, say, 20 years instead of 50, except that you won't make it 2 years without some fool synthesizing the strings and ending the world. And more generally, any pathway that relies on humans achieving deep understanding of the pivotal act will take more than 2 years, unless you make 'human understanding' one of the AI's goals, in which case the AI is optimizing human brains and you've lost safety.