What do we know about how much protection COVID vaccines provide against transmitting the virus to others? 2021-05-06T07:39:48.366Z
What do we know about how much protection COVID vaccines provide against long COVID? 2021-05-06T07:39:16.873Z
What do the reported levels of protection offered by various vaccines mean? 2021-05-04T22:06:23.758Z
Did they use serological testing for COVID vaccine trials? 2021-05-04T21:48:30.507Z
When's the best time to get the 2nd dose of Pfizer Vaccine? 2021-04-30T05:11:27.936Z
Are there any good ways to place a bet on RadicalXChange and/or related ideas/mechanisms taking off in a big way? e.g. is there something to invest $$$ in? 2021-04-17T06:58:42.414Z
What does vaccine effectiveness as a function of time look like? 2021-04-17T00:36:20.366Z
How many micromorts do you get per UV-index-hour? 2021-03-30T17:23:26.566Z
AI x-risk reduction: why I chose academia over industry 2021-03-14T17:25:12.503Z
"Beliefs" vs. "Notions" 2021-03-12T16:04:31.194Z
Any work on honeypots (to detect treacherous turn attempts)? 2020-11-12T05:41:56.371Z
When was the term "AI alignment" coined? 2020-10-21T18:27:56.162Z
Has anyone researched specification gaming with biological animals? 2020-10-21T00:20:01.610Z
Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI? 2020-10-04T08:10:56.400Z
capybaralet's Shortform 2020-08-27T21:38:18.144Z
A reductio ad absurdum for naive Functional/Computational Theory-of-Mind (FCToM). 2020-01-02T17:16:35.566Z
A list of good heuristics that the case for AI x-risk fails 2019-12-02T19:26:28.870Z
What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. 2019-12-02T18:20:47.530Z
A fun calibration game: "0-hit Google phrases" 2019-11-21T01:13:10.667Z
Can indifference methods redeem person-affecting views? 2019-11-12T04:23:10.011Z
What are the reasons to *not* consider reducing AI-Xrisk the highest priority cause? 2019-08-20T21:45:12.118Z
Project Proposal: Considerations for trading off capabilities and safety impacts of AI research 2019-08-06T22:22:20.928Z
False assumptions and leaky abstractions in machine learning and AI safety 2019-06-28T04:54:47.119Z
Let's talk about "Convergent Rationality" 2019-06-12T21:53:35.356Z
X-risks are a tragedies of the commons 2019-02-07T02:48:25.825Z
My use of the phrase "Super-Human Feedback" 2019-02-06T19:11:11.734Z
Thoughts on Ben Garfinkel's "How sure are we about this AI stuff?" 2019-02-06T19:09:20.809Z
The role of epistemic vs. aleatory uncertainty in quantifying AI-Xrisk 2019-01-31T06:13:35.321Z
Imitation learning considered unsafe? 2019-01-06T15:48:36.078Z
Conceptual Analysis for AI Alignment 2018-12-30T00:46:38.014Z
Disambiguating "alignment" and related notions 2018-06-05T15:35:15.091Z
Problems with learning values from observation 2016-09-21T00:40:49.102Z
Risks from Approximate Value Learning 2016-08-27T19:34:06.178Z
Inefficient Games 2016-08-23T17:47:02.882Z
Should we enable public binding precommitments? 2016-07-31T19:47:05.588Z
A Basic Problem of Ethics: Panpsychism? 2015-01-27T06:27:20.028Z
A Somewhat Vague Proposal for Grounding Ethics in Physics 2015-01-27T05:45:52.991Z


Comment by capybaralet on What does vaccine effectiveness as a function of time look like? · 2021-04-17T05:31:02.268Z · LW · GW

From that figure, it looks to me like roughly 0 protection until day 10 or 11, and then near perfect protection after that.  Surprisingly non-smooth!

Comment by capybaralet on How many micromorts do you get per UV-index-hour? · 2021-03-31T17:30:31.732Z · LW · GW

Oh yeah, sorry I was not clear about this...

I am actually trying to just consider the effects via cancer risk in isolation, and ignoring the potential benefits (which I think do go beyond just Vitamin D... probably a lot of stuff happening that we don't understand... certainly seems to have effect on mood, e.g.)

Comment by capybaralet on How many micromorts do you get per UV-index-hour? · 2021-03-31T17:29:58.849Z · LW · GW

Looks like just correlations, tho(?)
I basically wouldn't update on a single study that only looks at correlation.

Comment by capybaralet on AI x-risk reduction: why I chose academia over industry · 2021-03-17T01:17:39.608Z · LW · GW

You can try to partner with industry, and/or advocate for big government $$$.
I am generally more optimistic about toy problems than most people, I think, even for things like Debate.
Also, scaling laws can probably help here.

Comment by capybaralet on AI x-risk reduction: why I chose academia over industry · 2021-03-17T01:15:04.204Z · LW · GW

um sorta modulo a type error... risk is risk.  It doesn't mean the thing has happened (we need to start using some sort of phrase like "x-event" or something for that, I think).

Comment by capybaralet on AI x-risk reduction: why I chose academia over industry · 2021-03-16T01:56:16.393Z · LW · GW

Yeah we've definitely discussed it!  Rereading what I wrote, I did not clearly communicate what I intended to...I wanted to say that "I think the average trend was for people to update in my direction".  I will edit it accordingly.

I think the strength of the "usual reasons" has a lot to do with personal fit and what kind of research one wants to do.  Personally, I basically didn't consider salary as a factor.

Comment by capybaralet on AI x-risk reduction: why I chose academia over industry · 2021-03-16T01:49:02.716Z · LW · GW

When you say academia looks like a clear win within 5-10 years, is that assuming "academia" means "starting a tenure-track job now?" If instead one is considering whether to begin a PhD program, for example, would you say that the clear win range is more like 10-15 years?


Also, how important is being at a top-20 institution? If the tenure track offer was instead from University of Nowhere, would you change your recommendation and say go to industry?

My cut-off was probably somewhere between top-50 and top-100, and I was prepared to go anywhere in the world.  If I couldn't make into top 100, I think I would definitely have reconsidered academia.  If you're ready to go anywhere, I think it makes it much easier to find somewhere with high EV (but might have to move up the risk/reward curve a lot).

Would you agree that if the industry project you could work on is the one that will eventually build TAI (or be one of the leading builders, if there are multiple) then you have more influence from inside than from outside in academia?

Yes.  But ofc it's hard to know if that's the case.  I also think TAI is a less important category for me than x-risk inducing AI.

Comment by capybaralet on "Beliefs" vs. "Notions" · 2021-03-14T17:28:16.825Z · LW · GW

Thanks!  Quick question: how do you think these notions compare to factors in an undirected graphical model?  (This is the closest thing I know of to how I imagine "notions" being formalized).

Comment by capybaralet on "Beliefs" vs. "Notions" · 2021-03-14T16:24:40.892Z · LW · GW

Cool!  Can you give a more specific link please?

Comment by capybaralet on "Beliefs" vs. "Notions" · 2021-03-14T16:23:57.194Z · LW · GW

True, but it seems the meaning I'm using it for is primary:

Comment by capybaralet on Imitative Generalisation (AKA 'Learning the Prior') · 2021-03-14T06:19:44.929Z · LW · GW

It seems like z* is meant to represent "what the human thinks the task is, based on looking at D".
So why not just try to extract the posterior directly, instead of the prior an the likelihood separately?
(And then it seems like this whole thing reduces to "ask a human to specify the task".)

Comment by capybaralet on [AN #141]: The case for practicing alignment work on GPT-3 and other large models · 2021-03-12T02:28:51.620Z · LW · GW

Intersting... Maybe this comes down to different taste or something.  I understand, but don't agree with, the cow analogy... I'm not sure why, but one difference is that I think we know more about cows than DNNs or something.

I haven't thought about the Zipf-distributed thing.

> Taken literally, this is easy to do. Neural nets often get the right answer on never-before-seen data points, whereas Hutter's model doesn't. Presumably you mean something else but idk what.

I'd like to see Hutter's model "translated" a bit to DNNs, e.g. by assuming they get anything right that's within epsilon of a training data poing or something... maybe it even ends up looking like the other model in that context...


Comment by capybaralet on [AN #141]: The case for practicing alignment work on GPT-3 and other large models · 2021-03-10T19:53:20.999Z · LW · GW

I have a hard time saying which of the scaling laws explanations I like better (I haven't read either paper in detail, but I think I got the gist of both).
What's interesting about Hutter's is that the model is so simple, and doesn't require generalization at all. 
I feel like there's a pretty strong Occam's Razor-esque argument for preferring Hutter's model, even though it seems wildly less intuitive to me.
Or maybe what I want to say is more like "Hutter's model DEMANDS refutation/falsification".

I think both models also are very interesting for understanding DNN generaliztion... I really think it goes beyond memorization and local generalization (c.f., but it's interesting that those are basically the mechanisms proposed by Hutter and Sharma & Kaplan (resp.)...


Comment by capybaralet on The case for aligning narrowly superhuman models · 2021-03-09T13:59:46.229Z · LW · GW

Thanks for the response!
I see the approaches as more complimentary.  
Again, I think this is in keeping with standard/good ML practice.

A prototypical ML paper might first describe a motivating intuition, then formalize it via a formal model and demonstrate the intuition in that model (empirically or theoretically), then finally show the effect on real data.

The problem with only doing the real data (i.e. at scale) experiments is that it can be hard to isolate the phenomena you wish to study.  And so a positive result does less to confirm the motivating intuition, as there are many other factors as play that might be responsible.  We've seen this happen rather a lot in Deep Learning and Deep RL, in part because of the focus on empirical performance over a more scientific approach.

Comment by capybaralet on The feeling of breaking an Overton window · 2021-03-09T13:50:54.317Z · LW · GW

I think I know the feeling quite well.  I think for me anyways, it's basically "fear of being made fun of", stemming back to childhood.  I got made fun of a lot, and physically bullied as well (a few examples that jump to mind are: having my face shoved into the snow until I was scared of suffocating, being body slammed and squished the whole 45-minute bus ride home because I sat in the back seat (which the "big kids" claimed as their right), being shoulder-checked in the hall).

At some point I developed an attitude of "fuck those people", and decided to try to notice this feeling and not let it hold me back ever.  It's hard and I'm still not great at it.  I still get this feeling kind of often, mostly when I know I'm standing out and will get looks or comments on it, e.g. wearing my Narwall mask to go grocery shopping.  But to me it's more like a sign to dig in my heels.

I view this as part of a ongoing project to overcome my inhibitions.  I try to remain aware of my inhibitions, although it can be painful to recognize them, since they can be really limiting, and make one feel weak and ashamed.  I would guess most people typically rationalize their discomforts as stemming from something legitimate.  This seems really bad from the point of view of having accurate beliefs. 


Comment by capybaralet on The case for aligning narrowly superhuman models · 2021-03-08T00:35:02.153Z · LW · GW

I haven't read this in detail (hope to in the future); I only skimmed based on section headers.
I think the stuff about "what kinds of projects count" and "advantages over other genres" seem to miss an important alternative, which is to build and study toy models of the phenomena we care about.  This is a bit like the gridworlds stuff, but I thought the description of that work missed its potential, and didn't provide much of an argument for why working at scale would be more valuable.

This approach (building and studying toy models) is popular in ML research, and the leaders of the field (e.g. Rich Sutton) are big proponents of it, and think it is undervalued in the current research climate.  I agree.
Shameless plug for my work that follows this approach:

A relevant example would be to build toy models of "inaccessible information", and try to devise methods of extracting that information.  

This type of research fails your criteria for what "counts" with flying colors, but in my mind it seems approximately equally valuable to the kind of experiments you seem to have in mind -- and much cheaper to perform! 

Comment by capybaralet on Fun with +12 OOMs of Compute · 2021-03-05T13:50:07.483Z · LW · GW

There's a ton of work in meta-learning, including Neural Architecture Search (NAS).  AIGA's (Clune) is a paper that argues a similar POV to what I would describe here, so I'd check that out.  

I'll just say "why it would be powerful": the promise of meta-learning is that -- just like learned features outperform engineered features -- learned learning algorithms will eventually outperform engineered learning algorithms.  Taking the analogy seriously would suggest that the performance gap will be large -- a quantitative step-change.  

The upper limit we should anchor on is fully automated research.  This helps illustrate how powerful this could be, since automating research could easily give many orders of magnitude speed up (e.g. just consider the physical limitation of humans manually inputting information about what experiment to run).

An important underlying question is how much room there is for improvement over current techniques.  The idea that current DL techniques are pretty close to perfect (i.e. we've uncovered the fundamental principles of efficient learning (associated view: ...and maybe DNNs are a good model of the brain)) seems too often implicit in some of the discussions around forecasting and scaling.  I think it's a real possibility, but I think it's fairly unlikely (~15%, OTTMH).  The main evidence for it is that 99% of published improvements don't seem to make much difference in practice/at-scale.  

Assuming that current methods are roughly optimal has two important implications:
- no new fundamental breakthroughs needed for AGI (faster timelines)
- no possible acceleration from fundamental algorithmic breakthroughs (slower timelines)


Comment by capybaralet on Fun with +12 OOMs of Compute · 2021-03-02T21:15:11.436Z · LW · GW

Sure, but in what way?
Also I'd be happy to do a quick video chat if that would help (PM me).

Comment by capybaralet on Fun with +12 OOMs of Compute · 2021-03-02T03:16:57.500Z · LW · GW

I only read the prompt.  
But I want to say: that much compute would be useful for meta-learning/NAS/AIGAs, not just scaling up DNNs.  I think that would likely be a more productive research direction.  And I want to make sure that people are not ONLY imagining bigger DNNs when they imagine having a bunch more compute, but also imagining how it could be used to drive fundamental advances in ML algos, which could plausibly kick of something like recursive self-improvement (even in DNNs are in some sense a dead end).

Comment by capybaralet on the scaling “inconsistency”: openAI’s new insight · 2021-02-01T12:11:16.379Z · LW · GW

if your model gets more sample-efficient as it gets larger & n gets larger, it's because it's increasingly approaching a Bayes-optimal learner and so it gets more out of the more data, but then when you hit the Bayes-limit, how are you going to learn more from each datapoint? You have to switch over to a different and inferior scaling law. You can't squeeze blood from a stone; once you approach the intrinsic entropy, there's not much to learn.

I found this confusing.  It sort of seems like you're assuming that a Bayes-optimal learner achieves the Bayes error rate (are you ?), which seems wrong to me.

  • What do you mean "the Bayes-limit"?  At first, I assumed you were talking about the  Bayes error rate (, but that is (roughly) the error you coule expect to achieve with infinite data, and we're still talking about finite data.
  • What do you mean "Bayes-optimal learner"?  I assume you just mean something that performs Bayes rule exactly (so depends on the prior/data).  
  • I'm confused by you talking about "approach[ing] the intrinsic entropy"... it seems like the figure in OP shows L(C) approaching L(D).  But is L(D) supposed to represent intrinsic entropy?  should we trust it as an estimate of intrinsic entropy?

I also don't see how active learning is supposed to help (unless you're talking about actively generating data)... I thought the whole point you were trying to make is that once you reach the Bayes error rate there's literally nothing you can do to keep improving without more data.  
You talk about using active learning to throw out data-points... but I thought the problem was not having enough data?  So how is throwing out data supposed to help with that?

Comment by capybaralet on capybaralet's Shortform · 2020-12-16T20:37:26.273Z · LW · GW

I basically agree, but I do assign it to Moloch. *shrug

Comment by capybaralet on Any work on honeypots (to detect treacherous turn attempts)? · 2020-11-16T21:49:49.717Z · LW · GW

I strongly disagree.  
I think this is emblematic of the classic AI safety perspective/attitude, which has impeded and discouraged practical progress towards reducing AI x-risk by supporting an unnecessary and misleading emphasis on "ultimate solutions" that address the "arbitrarily intelligent agent trapped in a computer" threat model.
This is an important threat model, but it is just one of many.

My question is inspired by the situation where a scaled up GPT-3-like model is fine-tuned using RL and/or reward modelling.  In this case, it seems like we can honeypot the model during the initial training and have a good chance of catching it attempting a premature treacherous turn.  Whether or not the model would attempt a premature treacherous turn seems to depend on several factors.  
A hand-wavy argument for this strategy working is: an AI should conceive of the treacherous turn strategy before the honeypot counter-strategy because a counter-strategy presupposes the strategy it counters.

There are several reasons that make this not a brilliant research opportunity. Firstly, what is and is not a honeypot is sensitively dependant on the AI's capabilities and situation. There is no such thing as a one size fits all honeypot. 

I am more sympathetic to this argument, but it doesn't prevent us from doing research that is limited to specific situations.  It also proves to much, since combining this line of reasoning with no free lunch arguments would seem to invalidate all of machine learning.

Comment by capybaralet on Tips for the most immersive video calls · 2020-10-29T04:00:34.477Z · LW · GW

Any tipe for someone who's already bought the C920 and isn't happy with the webcam on their computer?  (e.g. details on the 2 hour process :P)

Comment by capybaralet on Has anyone researched specification gaming with biological animals? · 2020-10-21T00:21:20.822Z · LW · GW

There are probably a lot of things that people do with animals that can be viewed as "automatic training", but I don't think people are viewing them this way, or trying to create richer reward signals that would encourage the animals to demonstrate increasingly impressive feats of intelligence.

Comment by capybaralet on Industrial literacy · 2020-10-05T20:17:51.447Z · LW · GW

The claim I'm objecting to is:

all soil loses its fertility naturally over time

I guess your interpretation of "naturally" is "when non-sustainably farmed"? ;) 

My impression is that we know how to keep farmland productive without using fertilizers by rotating crops, letting fields lie fallow sometimes, and involving fauna.  Of course, this might be much less efficient than using synthetic fertilizers, so I'm not saying that's what we should be doing. 

Comment by capybaralet on Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI? · 2020-10-05T07:11:52.802Z · LW · GW

Is there a reference for this?

I was inspired to think of this by this puzzle (which I interpret as being about the distinction between epistemic and aleatoric uncertainty):

"To present another example, suppose that five tosses of a given coin are planned and that the agent has equal strength of belief for two outcomes, both beginning with H, say the outcomes HTTHT and HHTTH. Suppose the first toss is made, and results in a head. If all that the agent learns is that a head occurred on the first toss it seems unreasonable for him to move to a greater confidence in the occurrence of one sequence rather than another. The only thing he has found out is something which is logically implied by both propositions, and hence, it seems plausible to say, fails to differentiate between them.

This second example might be challenged along the following lines: The case might be one in which initially the agent is moderately confident that the coin is either biased toward heads or toward tails. But he has as strong a belief that the bias is the one way as the other. So initially he has the same degree of confidence that H will occur as that T will occur on any given toss, and so, by symmetry considerations, an equal degree of confidence in HTTHT and HHTTH. Now if H is observed on the first toss it is reasonable for the agent to have slightly more confidence that the coin is biased toward heads than toward tails. And if so it might seem he now should have more confidence that the sequence should conclude with the results HTTH than TTHT because the first of these sequence has more heads in it than tails."

Which is right?

What's striking to me is that the 2nd argument seems clearly correct, but only seems to work if you make a distinction between epistemic and aleatoric uncertainty, which I don't think AIXI does.  So that makes me wonder if it's doing something wrong (or if people who use Beta distributions to model coin flips are(!!))


Comment by capybaralet on Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI? · 2020-10-05T07:03:55.118Z · LW · GW

Yeah that seems right.  But I'm not aware of any such work OTTMH.

Comment by capybaralet on Weird Things About Money · 2020-10-05T04:00:09.610Z · LW · GW

I really like this.  I read part 1 as being about the way the economy or society implicitly imposes additional pressures on individuals' utility functions.  Can you provide a reference for the theorem that Kelly betters predominate?

EtA: an observation: the arguments for expected value also assume infinite value is possible, which (module infinite ethics style concerns, a significant caveat...) also isn't realistic. 


Comment by capybaralet on AGI safety from first principles: Control · 2020-10-04T08:38:23.886Z · LW · GW

Which previous arguments are you referring to?

Comment by capybaralet on Industrial literacy · 2020-10-04T08:27:08.134Z · LW · GW

That the food you eat is grown using synthetic fertilizers, and that this is needed for agricultural productivity, because all soil loses its fertility naturally over time if it is not deliberately replenished.

This claim doesn't make sense.  If it were true, plants would not have survived to the present day.

Steelmanning (which I would say OP doesn't do a good job of...), I'll interpret this as: "we are technologically reliant on synthetic fertilizers to grow enough food to feed the current population".  But in any case, there are harmful environmental consequences to our current practice that seem somewhat urgent to address:

Comment by capybaralet on capybaralet's Shortform · 2020-10-04T02:19:21.593Z · LW · GW

Some possible implications of more powerful AI/technology for privacy:

1) It's as if all of your logged data gets poured over by a team of super-detectives to make informed guesses about every aspect of your life, even those that seem completely unrelated to those kinds of data.

2) Even data that you try to hide can be read from things like reverse engineering what you type based on the sounds of you typing, etc.

3) Powerful actors will deploy advanced systems to model, predict, and influence your behavior, and extreme privacy precautions starting now may be warranted.

4) On the other hand, if you DON'T have a significant digital footprint, you may be significantly less trustworthy.  If AI systems don't know what to make of you, you may be the first up against the wall (compare with seeking credit without a having credit history).
5) On the other other hand ("on the foot"?), if you trust that future societies will be more enlightened, then you may be retroactively rewarded for being more enlightened today.

Anything important I left out?

Comment by capybaralet on capybaralet's Shortform · 2020-10-03T23:45:37.222Z · LW · GW

Comment by capybaralet on capybaralet's Shortform · 2020-10-03T23:44:23.049Z · LW · GW

We learned about RICE as a treatment for injuries (e.g. sprains) in middle school, and it's since stuck me as odd that you would want to inhibit the body's natural healing response.

It seems like RICE is being questioned by medical professionals, as well, but consensus is far off.

Anyone have thoughts/knowledge about this?

Comment by capybaralet on capybaralet's Shortform · 2020-09-30T08:04:44.814Z · LW · GW

Whelp... that's scary: 
Chip Huyen



Replying to


4. You won’t need to update your models as much One mindboggling fact about DevOps: Etsy deploys 50 times/day. Netflix 1000s times/day. AWS every 11.7 seconds. MLOps isn’t an exemption. For online ML systems, you want to update them as fast as humanly possible. (5/6)

Comment by capybaralet on Inviting Curated Authors to Give 5-Min Online Talks · 2020-09-24T02:48:41.323Z · LW · GW

I think I've had a few curated posts.  How could I find them?

Comment by capybaralet on Radical Probabilism [Transcript] · 2020-09-24T02:46:47.914Z · LW · GW

Abram Demski: But it's like, how do you do that if “I don't have a good hypothesis” doesn't make any predictions?

One way you can imagine this working is that you treat “I don't have a good hypothesis” as a special hypothesis that is not required to normalize to 1.  
For instance, it could say that observing any particular real number, r, has probability epsilon > 0.
So now it "makes predictions", but this doesn't just collapse to including another hypothesis and using Bayes rule.

You can also imagine updating this special hypothesis (which I called a "Socratic hypothesis" in comments on the original blog post on Radical Probabilism) in various ways. 

Comment by capybaralet on [AN #118]: Risks, solutions, and prioritization in a world with many AI systems · 2020-09-24T02:24:20.762Z · LW · GW

Regarding ARCHES, as an author:

  • I disagree with Critch that we should expect single/single delegation(/alignment) to be solved "by default" because of economic incentives.  I think economic incentives will not lead to it being solved well-enough, soon enough (e.g. see:  I guess Critch might put this in the "multi/multi" camp, but I think it's more general (e.g. I attribute a lot of the risk here to human irrationality/carelessness)
  • RE: "I find the argument less persuasive because we do have governance, regulations, national security etc. that would already be trying to mitigate issues that arise in multi-multi contexts, especially things that could plausibly cause extinction"... 1) These are all failing us when it comes to, e.g. climate change.  2) I don't think we should expect our institutions to keep up with rapid technological progress (you might say they are already failing to...).  My thought experiment from the paper is: "imagine if everyone woke up 1000000x smarter tomorrow."  Our current institutions would likely not survive the day and might or might not be improved quickly enough to keep ahead of bad actors / out-of-control conflict spirals.
Comment by capybaralet on [AN #118]: Risks, solutions, and prioritization in a world with many AI systems · 2020-09-24T02:06:15.480Z · LW · GW

these usually don’t assume “no intervention from longtermists”

I think the "don't" is a typo?

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-23T21:30:30.802Z · LW · GW

By managing incentives I expect we can, in practice, do things like: "[telling it to] restrict its lookahead to particular domains"... or remove any incentive for control of the environment.

I think we're talking past each other a bit here.

Comment by capybaralet on capybaralet's Shortform · 2020-09-23T21:27:00.273Z · LW · GW

For all of the hubbub about trying to elaborate better arguments for AI x-risk, it seems like a lot of people are describing the arguments in Superintelligence as relying on FOOM, agenty AI systems, etc. without actually justifying that description via references to the text.

It's been a while since I read Superintelligence, but my memory was that it anticipated a lot of counter-arguments quite well.  I'm not convinced that it requires such strong premises to make a compelling case.  So maybe someone interested in this project of clarifying the arguments should start with establishing that the arguments in superintelligence really have the weaknesses they are claimed to?

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-23T06:19:13.859Z · LW · GW

My intuitions on this matter are:
1) Stopping mesa-optimizing completely seems mad hard.
2) Managing "incentives" is the best way to deal with this stuff, and will probably scale to something like 1,000,000x human intelligence. 
3) On the other hand, it's probably won't scale forever.

To elaborate on the incentive management thing... if we figure that stuff out and do it right and it has the promise that I think it does... then it won't restrict lookahead to particular domains, but it will remove incentives for instrumental goal seeking.  

If we're still in a situation where the AI doesn't understand its physical environment and isn't incentivized to learn to control it, then we can do simple things like use a fixed dataset (as opposed to data we're collecting online) in order to make it harder for the AI to learn anything significant about its physical environment. 

Learning about the physical environment and using it to improve performance is not necessarily bad/scary absent incentives for control.  However, I worry that having a good world model makes an AI much more liable to infer that it should try to control and not just predict the world.

Comment by capybaralet on capybaralet's Shortform · 2020-09-23T06:10:12.367Z · LW · GW

Moloch is not about coordination failures.  Moloch is about the triumph of instrumental goals.  Maybe we can defeat Moloch with sufficiently good coordination.  It's worth a shot at least.

Comment by capybaralet on capybaralet's Shortform · 2020-09-22T00:36:51.522Z · LW · GW

Treacherous turns don't necessarily happen all at once. An AI system can start covertly recruiting resources outside its intended purview in preparation for a more overt power grab.

This can happen during training, without a deliberate "deployment" event. Once the AI has started recruiting resources, it can outperform AI systems that haven't done that on-distribution with resources left over which it can devote to pursuing its true objective or instrumental goals.

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-21T00:11:34.878Z · LW · GW

I didn't read the post (yet...), but I'm immediately skeptical of the claim that beam search is useful here ("in principle"), since GPT-3 is just doing next step prediction (it is never trained on its own outputs, IIUC). This means it should always just match the conditional P(x_t | x_1, .., x_{t-1}). That conditional itself can be viewed as being informed by possible future sequences, but conservation of expected evidence says we shouldn't be able to gain anything by doing beam search if we already know that conditional. Now it's true that efficiently estimating that conditional using a single forward pass of a transformer might involve approximations to beam search sometimes.

At a high level, I don't think we really need to be concerned with this form of "internal lookahead" unless/until it starts to incorporate mechanisms outside of the intended software environment (e.g. the hardware, humans, the external (non-virtual) world).

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-21T00:02:00.728Z · LW · GW

Seq2seq used beam search and found it helped ( It was standard practice in the early days of NMT; I'm not sure when that changed.

This blog post gives some insight into why beam search might not be a good idea, and is generally very interesting:

Comment by capybaralet on Radical Probabilism · 2020-09-20T23:57:32.509Z · LW · GW

This blog post seems superficially similar, but I can't say ATM if there are any interesting/meaningful connections:

Comment by capybaralet on AI Research Considerations for Human Existential Safety (ARCHES) · 2020-09-19T04:54:37.962Z · LW · GW

There is now also an interview with Critch here:

Comment by capybaralet on capybaralet's Shortform · 2020-09-18T23:30:30.114Z · LW · GW

A lot of the discussion of mesa-optimization seems confused.

One thing that might be relevant towards clearing up the confusion is just to remember that "learning" and "inference" should not be thought of as cleanly separated, in the first place, see, e.g. AIXI...

So when we ask "is it learning? Or just solving the task without learning", this seems like a confused framing to me. Suppose your ML system learned an excellent prior, and then just did Bayesian inference at test time. Is that learning? Sure, why not. It might not use a traditional search/optimization algorithm, but probably is has to do *something* like that for computational reasons if it wants to do efficient approximate Bayesian inference over a large hypothesis space, so...

Comment by capybaralet on Developmental Stages of GPTs · 2020-09-18T23:25:57.570Z · LW · GW
Sometimes people will give GPT-3 a prompt with some examples of inputs along with the sorts of responses they'd like to see from GPT-3 in response to those inputs ("few-shot learning", right? I don't know what 0-shot learning you're referring to.)

No, that's zero-shot. Few shot is when you train on those instead of just stuffing them into the context.

It looks like mesa-optimization because it seems to be doing something like learning about new tasks or new prompts that are very different from anything its seen before, without any training, just based on the context (0-shot).

Is your claim that GPT-3 succeeds at this sort of task by doing something akin to training a model internally?

By "training a model", I assume you mean "a ML model" (as opposed to, e.g. a world model). Yes, I am claiming something like that, but learning vs. inference is a blurry line.

I'm not saying it's doing SGD; I don't know what it's doing in order to solve these new tasks. But TBC, 96 steps of gradient descent could be a lot. MAML does meta-learning with 1.

Comment by capybaralet on capybaralet's Shortform · 2020-09-18T23:17:22.177Z · LW · GW

It seems like a lot of people are still thinking of alignment as too binary, which leads to critical errors in thinking like: "there will be sufficient economic incentives to solve alignment", and "once alignment is a bottleneck, nobody will want to deploy unaligned systems, since such a system won't actually do what they want".

It seems clear to me that:

1) These statements are true for a certain level of alignment, which I've called "approximate value learning" in the past ( I think I might have also referred to it as "pretty good alignment" or "good enough alignment" at various times.

2) This level of alignment is suboptimal from the point of view of x-safety, since the downside risk of extinction for the actors deploying the system is less than the downside risk of extinction summed over all humans.

3) We will develop techniques for "good enough" alignment before we develop techniques that are acceptable from the standpoint of x-safety.

4) Therefore, the expected outcome is: once "good enough alignment" is developed, a lot of actors deploy systems that are aligned enough for them to benefit from them, but still carry an unacceptably high level of x-risk.

5) Thus if we don't improve alignment techniques quickly enough after developing "good enough alignment", it's development will likely lead to a period of increased x-risk (under the "alignment bottleneck" model).