Posts

Any work on honeypots (to detect treacherous turn attempts)? 2020-11-12T05:41:56.371Z
When was the term "AI alignment" coined? 2020-10-21T18:27:56.162Z
Has anyone researched specification gaming with biological animals? 2020-10-21T00:20:01.610Z
Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI? 2020-10-04T08:10:56.400Z
capybaralet's Shortform 2020-08-27T21:38:18.144Z
A reductio ad absurdum for naive Functional/Computational Theory-of-Mind (FCToM). 2020-01-02T17:16:35.566Z
A list of good heuristics that the case for AI x-risk fails 2019-12-02T19:26:28.870Z
What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. 2019-12-02T18:20:47.530Z
A fun calibration game: "0-hit Google phrases" 2019-11-21T01:13:10.667Z
Can indifference methods redeem person-affecting views? 2019-11-12T04:23:10.011Z
What are the reasons to *not* consider reducing AI-Xrisk the highest priority cause? 2019-08-20T21:45:12.118Z
Project Proposal: Considerations for trading off capabilities and safety impacts of AI research 2019-08-06T22:22:20.928Z
False assumptions and leaky abstractions in machine learning and AI safety 2019-06-28T04:54:47.119Z
Let's talk about "Convergent Rationality" 2019-06-12T21:53:35.356Z
X-risks are a tragedies of the commons 2019-02-07T02:48:25.825Z
My use of the phrase "Super-Human Feedback" 2019-02-06T19:11:11.734Z
Thoughts on Ben Garfinkel's "How sure are we about this AI stuff?" 2019-02-06T19:09:20.809Z
The role of epistemic vs. aleatory uncertainty in quantifying AI-Xrisk 2019-01-31T06:13:35.321Z
Imitation learning considered unsafe? 2019-01-06T15:48:36.078Z
Conceptual Analysis for AI Alignment 2018-12-30T00:46:38.014Z
Disambiguating "alignment" and related notions 2018-06-05T15:35:15.091Z
Problems with learning values from observation 2016-09-21T00:40:49.102Z
Risks from Approximate Value Learning 2016-08-27T19:34:06.178Z
Inefficient Games 2016-08-23T17:47:02.882Z
Should we enable public binding precommitments? 2016-07-31T19:47:05.588Z
A Basic Problem of Ethics: Panpsychism? 2015-01-27T06:27:20.028Z
A Somewhat Vague Proposal for Grounding Ethics in Physics 2015-01-27T05:45:52.991Z

Comments

Comment by capybaralet on Any work on honeypots (to detect treacherous turn attempts)? · 2020-11-16T21:49:49.717Z · LW · GW

I strongly disagree.  
I think this is emblematic of the classic AI safety perspective/attitude, which has impeded and discouraged practical progress towards reducing AI x-risk by supporting an unnecessary and misleading emphasis on "ultimate solutions" that address the "arbitrarily intelligent agent trapped in a computer" threat model.
This is an important threat model, but it is just one of many.

My question is inspired by the situation where a scaled up GPT-3-like model is fine-tuned using RL and/or reward modelling.  In this case, it seems like we can honeypot the model during the initial training and have a good chance of catching it attempting a premature treacherous turn.  Whether or not the model would attempt a premature treacherous turn seems to depend on several factors.  
A hand-wavy argument for this strategy working is: an AI should conceive of the treacherous turn strategy before the honeypot counter-strategy because a counter-strategy presupposes the strategy it counters.

There are several reasons that make this not a brilliant research opportunity. Firstly, what is and is not a honeypot is sensitively dependant on the AI's capabilities and situation. There is no such thing as a one size fits all honeypot. 

I am more sympathetic to this argument, but it doesn't prevent us from doing research that is limited to specific situations.  It also proves to much, since combining this line of reasoning with no free lunch arguments would seem to invalidate all of machine learning.

Comment by capybaralet on Tips for the most immersive video calls · 2020-10-29T04:00:34.477Z · LW · GW

Any tipe for someone who's already bought the C920 and isn't happy with the webcam on their computer?  (e.g. details on the 2 hour process :P)

Comment by capybaralet on Has anyone researched specification gaming with biological animals? · 2020-10-21T00:21:20.822Z · LW · GW

There are probably a lot of things that people do with animals that can be viewed as "automatic training", but I don't think people are viewing them this way, or trying to create richer reward signals that would encourage the animals to demonstrate increasingly impressive feats of intelligence.

Comment by capybaralet on Industrial literacy · 2020-10-05T20:17:51.447Z · LW · GW

The claim I'm objecting to is:

all soil loses its fertility naturally over time


I guess your interpretation of "naturally" is "when non-sustainably farmed"? ;) 

My impression is that we know how to keep farmland productive without using fertilizers by rotating crops, letting fields lie fallow sometimes, and involving fauna.  Of course, this might be much less efficient than using synthetic fertilizers, so I'm not saying that's what we should be doing. 

Comment by capybaralet on Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI? · 2020-10-05T07:11:52.802Z · LW · GW

Is there a reference for this?

I was inspired to think of this by this puzzle (which I interpret as being about the distinction between epistemic and aleatoric uncertainty):

"""
"To present another example, suppose that five tosses of a given coin are planned and that the agent has equal strength of belief for two outcomes, both beginning with H, say the outcomes HTTHT and HHTTH. Suppose the first toss is made, and results in a head. If all that the agent learns is that a head occurred on the first toss it seems unreasonable for him to move to a greater confidence in the occurrence of one sequence rather than another. The only thing he has found out is something which is logically implied by both propositions, and hence, it seems plausible to say, fails to differentiate between them.

This second example might be challenged along the following lines: The case might be one in which initially the agent is moderately confident that the coin is either biased toward heads or toward tails. But he has as strong a belief that the bias is the one way as the other. So initially he has the same degree of confidence that H will occur as that T will occur on any given toss, and so, by symmetry considerations, an equal degree of confidence in HTTHT and HHTTH. Now if H is observed on the first toss it is reasonable for the agent to have slightly more confidence that the coin is biased toward heads than toward tails. And if so it might seem he now should have more confidence that the sequence should conclude with the results HTTH than TTHT because the first of these sequence has more heads in it than tails."

Which is right?
"""

What's striking to me is that the 2nd argument seems clearly correct, but only seems to work if you make a distinction between epistemic and aleatoric uncertainty, which I don't think AIXI does.  So that makes me wonder if it's doing something wrong (or if people who use Beta distributions to model coin flips are(!!))


 

Comment by capybaralet on Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI? · 2020-10-05T07:03:55.118Z · LW · GW

Yeah that seems right.  But I'm not aware of any such work OTTMH.

Comment by capybaralet on Weird Things About Money · 2020-10-05T04:00:09.610Z · LW · GW

I really like this.  I read part 1 as being about the way the economy or society implicitly imposes additional pressures on individuals' utility functions.  Can you provide a reference for the theorem that Kelly betters predominate?

EtA: an observation: the arguments for expected value also assume infinite value is possible, which (module infinite ethics style concerns, a significant caveat...) also isn't realistic. 


 

Comment by capybaralet on AGI safety from first principles: Control · 2020-10-04T08:38:23.886Z · LW · GW

Which previous arguments are you referring to?

Comment by capybaralet on Industrial literacy · 2020-10-04T08:27:08.134Z · LW · GW

That the food you eat is grown using synthetic fertilizers, and that this is needed for agricultural productivity, because all soil loses its fertility naturally over time if it is not deliberately replenished.

This claim doesn't make sense.  If it were true, plants would not have survived to the present day.

Steelmanning (which I would say OP doesn't do a good job of...), I'll interpret this as: "we are technologically reliant on synthetic fertilizers to grow enough food to feed the current population".  But in any case, there are harmful environmental consequences to our current practice that seem somewhat urgent to address: https://en.wikipedia.org/wiki/Haber_process#Economic_and_environmental_aspects

Comment by capybaralet on capybaralet's Shortform · 2020-10-04T02:19:21.593Z · LW · GW

Some possible implications of more powerful AI/technology for privacy:

1) It's as if all of your logged data gets poured over by a team of super-detectives to make informed guesses about every aspect of your life, even those that seem completely unrelated to those kinds of data.

2) Even data that you try to hide can be read from things like reverse engineering what you type based on the sounds of you typing, etc.

3) Powerful actors will deploy advanced systems to model, predict, and influence your behavior, and extreme privacy precautions starting now may be warranted.

4) On the other hand, if you DON'T have a significant digital footprint, you may be significantly less trustworthy.  If AI systems don't know what to make of you, you may be the first up against the wall (compare with seeking credit without a having credit history).
 
5) On the other other hand ("on the foot"?), if you trust that future societies will be more enlightened, then you may be retroactively rewarded for being more enlightened today.

Anything important I left out?

Comment by capybaralet on capybaralet's Shortform · 2020-10-03T23:45:37.222Z · LW · GW

https://thischangedmypractice.com/move-an-injury-not-rice/
https://www.stoneclinic.com/blog/why-rice-not-always-nice-and-some-thoughts-about-swelling

Comment by capybaralet on capybaralet's Shortform · 2020-10-03T23:44:23.049Z · LW · GW

We learned about RICE as a treatment for injuries (e.g. sprains) in middle school, and it's since stuck me as odd that you would want to inhibit the body's natural healing response.

It seems like RICE is being questioned by medical professionals, as well, but consensus is far off.

Anyone have thoughts/knowledge about this?

Comment by capybaralet on capybaralet's Shortform · 2020-09-30T08:04:44.814Z · LW · GW

Whelp... that's scary: 
Chip Huyen

@chipro

 

Replying to

@chipro

4. You won’t need to update your models as much One mindboggling fact about DevOps: Etsy deploys 50 times/day. Netflix 1000s times/day. AWS every 11.7 seconds. MLOps isn’t an exemption. For online ML systems, you want to update them as fast as humanly possible. (5/6)
https://twitter.com/chipro/status/1310952553459462146

Comment by capybaralet on Inviting Curated Authors to Give 5-Min Online Talks · 2020-09-24T02:48:41.323Z · LW · GW

I think I've had a few curated posts.  How could I find them?

Comment by capybaralet on Radical Probabilism [Transcript] · 2020-09-24T02:46:47.914Z · LW · GW

Abram Demski: But it's like, how do you do that if “I don't have a good hypothesis” doesn't make any predictions?

One way you can imagine this working is that you treat “I don't have a good hypothesis” as a special hypothesis that is not required to normalize to 1.  
For instance, it could say that observing any particular real number, r, has probability epsilon > 0.
So now it "makes predictions", but this doesn't just collapse to including another hypothesis and using Bayes rule.

You can also imagine updating this special hypothesis (which I called a "Socratic hypothesis" in comments on the original blog post on Radical Probabilism) in various ways. 

Comment by capybaralet on [AN #118]: Risks, solutions, and prioritization in a world with many AI systems · 2020-09-24T02:24:20.762Z · LW · GW

Regarding ARCHES, as an author:

  • I disagree with Critch that we should expect single/single delegation(/alignment) to be solved "by default" because of economic incentives.  I think economic incentives will not lead to it being solved well-enough, soon enough (e.g. see:
     https://www.lesswrong.com/posts/DmLg3Q4ZywCj6jHBL/capybaralet-s-shortform?commentId=wBc2cZaDEBX2rb4GQ)  I guess Critch might put this in the "multi/multi" camp, but I think it's more general (e.g. I attribute a lot of the risk here to human irrationality/carelessness)
  • RE: "I find the argument less persuasive because we do have governance, regulations, national security etc. that would already be trying to mitigate issues that arise in multi-multi contexts, especially things that could plausibly cause extinction"... 1) These are all failing us when it comes to, e.g. climate change.  2) I don't think we should expect our institutions to keep up with rapid technological progress (you might say they are already failing to...).  My thought experiment from the paper is: "imagine if everyone woke up 1000000x smarter tomorrow."  Our current institutions would likely not survive the day and might or might not be improved quickly enough to keep ahead of bad actors / out-of-control conflict spirals.
     
Comment by capybaralet on [AN #118]: Risks, solutions, and prioritization in a world with many AI systems · 2020-09-24T02:06:15.480Z · LW · GW

these usually don’t assume “no intervention from longtermists”

I think the "don't" is a typo?

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-23T21:30:30.802Z · LW · GW

By managing incentives I expect we can, in practice, do things like: "[telling it to] restrict its lookahead to particular domains"... or remove any incentive for control of the environment.

I think we're talking past each other a bit here.

Comment by capybaralet on capybaralet's Shortform · 2020-09-23T21:27:00.273Z · LW · GW

For all of the hubbub about trying to elaborate better arguments for AI x-risk, it seems like a lot of people are describing the arguments in Superintelligence as relying on FOOM, agenty AI systems, etc. without actually justifying that description via references to the text.

It's been a while since I read Superintelligence, but my memory was that it anticipated a lot of counter-arguments quite well.  I'm not convinced that it requires such strong premises to make a compelling case.  So maybe someone interested in this project of clarifying the arguments should start with establishing that the arguments in superintelligence really have the weaknesses they are claimed to?

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-23T06:19:13.859Z · LW · GW

My intuitions on this matter are:
1) Stopping mesa-optimizing completely seems mad hard.
2) Managing "incentives" is the best way to deal with this stuff, and will probably scale to something like 1,000,000x human intelligence. 
3) On the other hand, it's probably won't scale forever.

To elaborate on the incentive management thing... if we figure that stuff out and do it right and it has the promise that I think it does... then it won't restrict lookahead to particular domains, but it will remove incentives for instrumental goal seeking.  

If we're still in a situation where the AI doesn't understand its physical environment and isn't incentivized to learn to control it, then we can do simple things like use a fixed dataset (as opposed to data we're collecting online) in order to make it harder for the AI to learn anything significant about its physical environment. 

Learning about the physical environment and using it to improve performance is not necessarily bad/scary absent incentives for control.  However, I worry that having a good world model makes an AI much more liable to infer that it should try to control and not just predict the world.

Comment by capybaralet on capybaralet's Shortform · 2020-09-23T06:10:12.367Z · LW · GW

Moloch is not about coordination failures.  Moloch is about the triumph of instrumental goals.  Maybe we can defeat Moloch with sufficiently good coordination.  It's worth a shot at least.

Comment by capybaralet on capybaralet's Shortform · 2020-09-22T00:36:51.522Z · LW · GW

Treacherous turns don't necessarily happen all at once. An AI system can start covertly recruiting resources outside its intended purview in preparation for a more overt power grab.

This can happen during training, without a deliberate "deployment" event. Once the AI has started recruiting resources, it can outperform AI systems that haven't done that on-distribution with resources left over which it can devote to pursuing its true objective or instrumental goals.

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-21T00:11:34.878Z · LW · GW

I didn't read the post (yet...), but I'm immediately skeptical of the claim that beam search is useful here ("in principle"), since GPT-3 is just doing next step prediction (it is never trained on its own outputs, IIUC). This means it should always just match the conditional P(x_t | x_1, .., x_{t-1}). That conditional itself can be viewed as being informed by possible future sequences, but conservation of expected evidence says we shouldn't be able to gain anything by doing beam search if we already know that conditional. Now it's true that efficiently estimating that conditional using a single forward pass of a transformer might involve approximations to beam search sometimes.

At a high level, I don't think we really need to be concerned with this form of "internal lookahead" unless/until it starts to incorporate mechanisms outside of the intended software environment (e.g. the hardware, humans, the external (non-virtual) world).

Comment by capybaralet on Why GPT wants to mesa-optimize & how we might change this · 2020-09-21T00:02:00.728Z · LW · GW

Seq2seq used beam search and found it helped (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43155.pdf). It was standard practice in the early days of NMT; I'm not sure when that changed.

This blog post gives some insight into why beam search might not be a good idea, and is generally very interesting: https://benanne.github.io/2020/09/01/typicality.html

Comment by capybaralet on Radical Probabilism · 2020-09-20T23:57:32.509Z · LW · GW

This blog post seems superficially similar, but I can't say ATM if there are any interesting/meaningful connections:

https://www.inference.vc/the-secular-bayesian-using-belief-distributions-without-really-believing/

Comment by capybaralet on AI Research Considerations for Human Existential Safety (ARCHES) · 2020-09-19T04:54:37.962Z · LW · GW

There is now also an interview with Critch here: https://futureoflife.org/2020/09/15/andrew-critch-on-ai-research-considerations-for-human-existential-safety/

Comment by capybaralet on capybaralet's Shortform · 2020-09-18T23:30:30.114Z · LW · GW

A lot of the discussion of mesa-optimization seems confused.

One thing that might be relevant towards clearing up the confusion is just to remember that "learning" and "inference" should not be thought of as cleanly separated, in the first place, see, e.g. AIXI...

So when we ask "is it learning? Or just solving the task without learning", this seems like a confused framing to me. Suppose your ML system learned an excellent prior, and then just did Bayesian inference at test time. Is that learning? Sure, why not. It might not use a traditional search/optimization algorithm, but probably is has to do *something* like that for computational reasons if it wants to do efficient approximate Bayesian inference over a large hypothesis space, so...

Comment by capybaralet on Developmental Stages of GPTs · 2020-09-18T23:25:57.570Z · LW · GW
Sometimes people will give GPT-3 a prompt with some examples of inputs along with the sorts of responses they'd like to see from GPT-3 in response to those inputs ("few-shot learning", right? I don't know what 0-shot learning you're referring to.)

No, that's zero-shot. Few shot is when you train on those instead of just stuffing them into the context.

It looks like mesa-optimization because it seems to be doing something like learning about new tasks or new prompts that are very different from anything its seen before, without any training, just based on the context (0-shot).

Is your claim that GPT-3 succeeds at this sort of task by doing something akin to training a model internally?

By "training a model", I assume you mean "a ML model" (as opposed to, e.g. a world model). Yes, I am claiming something like that, but learning vs. inference is a blurry line.

I'm not saying it's doing SGD; I don't know what it's doing in order to solve these new tasks. But TBC, 96 steps of gradient descent could be a lot. MAML does meta-learning with 1.

Comment by capybaralet on capybaralet's Shortform · 2020-09-18T23:17:22.177Z · LW · GW

It seems like a lot of people are still thinking of alignment as too binary, which leads to critical errors in thinking like: "there will be sufficient economic incentives to solve alignment", and "once alignment is a bottleneck, nobody will want to deploy unaligned systems, since such a system won't actually do what they want".

It seems clear to me that:

1) These statements are true for a certain level of alignment, which I've called "approximate value learning" in the past (https://www.lesswrong.com/posts/rLTv9Sx3A79ijoonQ/risks-from-approximate-value-learning). I think I might have also referred to it as "pretty good alignment" or "good enough alignment" at various times.

2) This level of alignment is suboptimal from the point of view of x-safety, since the downside risk of extinction for the actors deploying the system is less than the downside risk of extinction summed over all humans.

3) We will develop techniques for "good enough" alignment before we develop techniques that are acceptable from the standpoint of x-safety.

4) Therefore, the expected outcome is: once "good enough alignment" is developed, a lot of actors deploy systems that are aligned enough for them to benefit from them, but still carry an unacceptably high level of x-risk.

5) Thus if we don't improve alignment techniques quickly enough after developing "good enough alignment", it's development will likely lead to a period of increased x-risk (under the "alignment bottleneck" model).

Comment by capybaralet on capybaralet's Shortform · 2020-09-18T23:06:15.180Z · LW · GW

No, I'm talking about it breaking out during training. The only "shifts" here are:

1) the AI gets smarter

2) (perhaps) the AI covertly influences its external environment (i.e. breaks out of the box a bit).

We can imagine scenarios where it's only (1) and not (2). I find them a bit more far-fetched, but this is the classic vision of the treacherous turn... the AI makes a plan, and then suddenly executes it to attain DSA. Once it starts to execute, ofc there is distributional shift, but:

A) it is auto-induced distributional shift

B) the developers never decided to deploy

Comment by capybaralet on [AN #115]: AI safety research problems in the AI-GA framework · 2020-09-17T21:19:01.438Z · LW · GW

MAIEI also has an AI Ethics newsletter I recommend for those interested in the topic.

Comment by capybaralet on [AN #115]: AI safety research problems in the AI-GA framework · 2020-09-17T21:17:51.415Z · LW · GW
I actually expect that the work needed for the open-ended search paradigm will end up looking very similar to the work needed by the “AGI via deep RL” paradigm: the differences I see are differences in difficulty, not differences in what problems qualitatively need to be solved.

I'm inclined to agree. I wonder if there are any distinctive features that jump out?

Comment by capybaralet on [AN #114]: Theory-inspired safety solutions for powerful Bayesian RL agents · 2020-09-17T21:11:03.422Z · LW · GW

Regarding curriculum learning: I think its very neglected, and seems likely to be a core component of prosaic alignment approaches. The idea of a "basin of attraction for corrigibility (or other desirable properties)" seems likely to rely on appropriate choice of curriculum.

Comment by capybaralet on capybaralet's Shortform · 2020-09-17T19:39:33.985Z · LW · GW

I'm frustrated with the meme that "mesa-optimization/pseudo-alignment is a robustness (i.e. OOD) problem". IIUC, this is definitionally true in the mesa-optimization paper, but I think this misses the point.

In particular, this seems to exclude an important (maybe the most important) threat model: the AI understands how to appear aligned, and does so, while covertly pursues its own objective on-distribution, during training.

This is exactly how I imagine a treacherous turn from a boxed superintelligent AI agent to occur, for instance. It secretly begins breaking out of the box (e.g. via manipulating humans) and we don't notice until its too late.

Comment by capybaralet on Why is pseudo-alignment "worse" than other ways ML can fail to generalize? · 2020-09-17T19:20:20.943Z · LW · GW

I disagree with the framing that: "pseudo-alignment is a type of robustness/distributional shift problem". This is literally true based on how it's defined in the paper. But I think in practice, we should expect approximately aligned mesa-optimizers that do very bad things on-distribution (without being detected).

Comment by capybaralet on Mesa-Search vs Mesa-Control · 2020-09-17T19:02:43.306Z · LW · GW

I guess most of my cruxes are RE your 2nd "=>", and can almost be viewed as breaking down this question into sub-questions. It might be worth sketching out a quantitative model here.

Comment by capybaralet on Mesa-Search vs Mesa-Control · 2020-09-17T19:01:14.255Z · LW · GW

Yep. I'd love to see more discussion around these cruxes (e.g. I'd be up for a public or private discussion sometime, or moderating one with someone from MIRI). I'd guess some of the main underlying cruxes are:

  • How hard are these problems to fix?
  • How motivated will the research community be to fix them?
  • How likely will developers be to use the fixes?
  • How reliably will developers need to use the fixes? (e.g. how much x-risk would result from a small company *not* using them?)

Personally, OTTMH (numbers pulled out of my ass), my views on these cruxes are:

  • It's hard to say, but I'd say there's a ~85% chance they are extremely difficult (effectively intractable on short-to-medium (~40yrs) timelines).
  • A small minority (~1-20%) of researchers will be highly motivated to fix them, once they are apparent/prominent. More researchers (~10-80%) will focus on patches.
  • Conditioned on fixes being easy and cheap to apply, large orgs will be very likely to use them (~90%); small orgs less so (~50%). Fixes are likely to be easy to apply (we'll build good tools), if they are cheap enough to be deemed "practical", but very unlikely (~10%) to be cheap enough.
  • It will probably need to be highly reliable; "the necessary intelligence/resources needed to destroy the world goes down every year" (unless we make a lot of progress of governance, which seems fairly unlikely (~15%))
Comment by capybaralet on Mesa-Search vs Mesa-Control · 2020-09-17T07:59:10.276Z · LW · GW

I'm very curious to know whether people at MIRI in fact disagree with this claim.

I would expect that they don't... e.g. Eliezer seems to think we'll see them and patch them unsuccessfully: https://www.facebook.com/jefftk/posts/886930452142?comment_id=886983450932&comment_tracking=%7B%22tn%22%3A%22R%22%7D

Comment by capybaralet on Mesa-Search vs Mesa-Control · 2020-09-17T07:47:04.008Z · LW · GW

Practically speaking, I think the big difference is that the history is outside of GPT-3's control, but a recurrent memory would be inside its control.

Comment by capybaralet on Mesa-Search vs Mesa-Control · 2020-09-17T07:38:31.628Z · LW · GW
Comment by capybaralet on capybaralet's Shortform · 2020-09-17T06:11:28.481Z · LW · GW

It might be a passive request, I'm not actually sure... I'd think of it more like an invitation, which you are free to decline. Although OFC, declining an invitation does send a message whether you like it or not *shrug.

Comment by capybaralet on capybaralet's Shortform · 2020-09-16T23:04:19.358Z · LW · GW

I guess one problem here is that how someone responds to such a statement carries information about how much they respect you...

If someone you are honored to even get the time of day from writes that, you will almost certainly craft a strong response about X...

Comment by capybaralet on capybaralet's Shortform · 2020-09-16T23:03:02.965Z · LW · GW

I like "tell culture" and find myself leaning towards it more often these days, but e.g. as I'm composing an email, I'll find myself worrying that the recipient will just interpret a statement like: "I'm curious about X" as a somewhat passive request for information about X (which it sort of is, but also I really don't want it to come across that way...)

Anyone have thoughts/suggestions?

Comment by capybaralet on capybaralet's Shortform · 2020-09-16T09:22:07.097Z · LW · GW

As alignment techniques improve, they'll get good enough to solve new tasks before they get good enough to do so safely. This is a source of x-risk.

Comment by capybaralet on capybaralet's Shortform · 2020-09-15T23:53:17.695Z · LW · GW

What's our backup plan if the internet *really* goes to shit?

E.g. Google search seems to have suddenly gotten way worse for searching for machine learning papers for me in the last month or so. I'd gotten used to it being great, and don't have a good backup.

Comment by capybaralet on capybaralet's Shortform · 2020-09-15T23:50:56.763Z · LW · GW

A friend asked me what EAs think of https://en.wikipedia.org/wiki/Chuck_Feeney.

Here's my response (based on ~1 minute of Googling):

He seems to have what I call a "moral purity" attitude towards morality.
By this I mean, thinking of ethics as less consequentialist and more about "being a good person".

I think such an attitude is natural, very typical and not very EA.So, e.g. living frugally might or might not be EA, but definitely makes sense if you believe we have strong charitable obligations and have a moral purity attitude towards morality.

Moving away from moral purity and giving consequentialist arguments against it are maybe one of the main insights of EA vs. Peter Singer.

Comment by capybaralet on TurnTrout's shortform feed · 2020-09-15T08:27:19.399Z · LW · GW

I found this fascinating... it's rare these days that I see some fundamental assumption in my thinking that I didn't even realize I was making laid bare like this... it is particularly striking because I think I could easily have realized that my own experience contradicts catharsis theory... I know that I can distract myself to become less angry, but I usually don't want to, in the moment.

I think that desire is driven by emotion, but rationalized via something like catharsis theory. I want to try and rescue catharsis theory by saying that maybe there are negative long-term effects of being distracted from feelings of anger (e.g. a build up of resentment). I wonder how much this is also a rationalization.

I also wonder how accurately the authors have characterized catharsis theory, and how much to identify it with the "hydraulic model of anger"... I would imagine that there are lots of attempts along the lines of what I suggested to try and rescue catharsis theory by refining or moving away from the hydraulic model. A highly general version might claim: "over a long time horizon, not 'venting' anger is net negative".

Comment by capybaralet on MakoYass's Shortform · 2020-09-15T07:51:32.701Z · LW · GW

The obvious bad consequence is a false sense of security leading people to just get BCIs instead of trying harder to shape (e.g. delay) AI development.

" You can't make horses competitive with cars by giving them exoskeletons. " <-- this reads to me like a separate argument, rather than a restatement of the one that came before.

I agree that BCI seems unlikely to be a good permanent/long-term solution, unless it helps us solve alignment, which I think it could. It could also just defuse a conflict between AIs and humans, leading us to gracefully give up our control over the future light cone instead of fighting a (probably losing) battle to retain it.


...Your post made me think more about my own (and others') reasons for rejecting Neuralink as a bad idea... I think there's a sense of "we're the experts and Elon is a n00b". This coupled with feeling a bit burned by Elon first starting his own AI safety org and then ditching it for this... overall doesn't feel great.

Comment by capybaralet on Tofly's Shortform · 2020-09-15T07:40:15.381Z · LW · GW

Regarding your "intuition that there should be some “best architecture”, at least for any given environment, and that this architecture should be relatively “simple”.", I think:

1) I'd say "task" rather than "environment", unless I wanted to emphasize that I think selection pressure trumps the orthogonality thesis (I'm ambivalent, FWIW).

2) I don't see why it should be "simple" (and relative to what?) in every case, but I sort of share this intuition for most cases...

3) On the other hand, I think any system with other agents probably is much more complicated (IIUC, a lot of people think social complexity drove selection pressure for human-level intelligence in a feedback loop). At a "gears level" the reason this creates an insatiable drive for greater complexity is that social dynamics can be quite winner-takes-all... if you're one step ahead of everyone else (and they don't realize it), then you can fleece them.

Comment by capybaralet on Tofly's Shortform · 2020-09-15T07:34:57.979Z · LW · GW

I don't think asymptotic reasoning is really the right tool for the job here.

We *know* things level off eventually because of physical limits (https://en.wikipedia.org/wiki/Limits_of_computation).

But fast takeoff is about how fast we go from where we are now to (e.g.) a superintelligence with a decisive strategic advantage (DSA). DSA probably doesn't require something near the physical limits of computation.