AI Safety Camp 2024 2023-11-18T10:37:02.183Z
Projects I would like to see (possibly at AI Safety Camp) 2023-09-27T21:27:29.539Z
Apply to lead a project during the next virtual AI Safety Camp 2023-09-13T13:29:09.198Z
How teams went about their research at AI Safety Camp edition 8 2023-09-09T16:34:05.801Z
Virtual AI Safety Unconference (VAISU) 2023-06-13T09:56:22.542Z
AISC end of program presentations 2023-06-06T15:45:04.873Z
Project Idea: Lots of Cause-area-specific Online Unconferences 2023-02-06T11:05:27.468Z
AI Safety Camp, Virtual Edition 2023 2023-01-06T11:09:07.302Z
Why don't we have self driving cars yet? 2022-11-14T12:19:09.808Z
How I think about alignment 2022-08-13T10:01:01.096Z
Infohazard Discussion with Anders Sandberg 2021-03-30T10:12:45.901Z
AI Safety Beginners Meetup (Pacific Time) 2021-03-04T01:44:33.856Z
AI Safety Beginners Meetup (European Time) 2021-02-20T13:20:42.748Z
AISU 2021 2021-01-30T17:40:38.292Z
Online AI Safety Discussion Day 2020-10-08T12:11:56.934Z
AI Safety Discussion Day 2020-09-15T14:40:18.777Z
Online LessWrong Community Weekend 2020-08-31T23:35:11.670Z
Online LessWrong Community Weekend, September 11th-13th 2020-08-01T14:55:38.986Z
AI Safety Discussion Days 2020-05-27T16:54:47.875Z
Announcing Web-TAISU, May 13-17 2020-04-04T11:48:14.128Z
Requesting examples of successful remote research collaborations, and information on what made it work? 2020-03-31T23:31:23.249Z
Coronavirus Tech Handbook 2020-03-21T23:27:48.134Z
[Meta] Do you want AIS Webinars? 2020-03-21T16:01:02.814Z
TAISU - Technical AI Safety Unconference 2020-01-29T13:31:36.431Z
Linda Linsefors's Shortform 2020-01-24T13:08:26.059Z
1st Athena Rationality Workshop - Retrospective 2019-07-17T16:51:36.754Z
Learning-by-doing AI Safety Research workshop 2019-05-24T09:42:49.996Z
TAISU - Technical AI Safety Unconference 2019-05-21T18:34:34.051Z
The Athena Rationality Workshop - June 7th-10th at EA Hotel 2019-05-11T01:01:01.973Z
The Athena Rationality Workshop - June 7th-10th at EA Hotel 2019-05-10T22:08:03.600Z
The Game Theory of Blackmail 2019-03-22T17:44:36.545Z
Optimization Regularization through Time Penalty 2019-01-01T13:05:33.131Z
Generalized Kelly betting 2018-07-19T01:38:21.311Z
Non-resolve as Resolve 2018-07-10T23:31:15.932Z
Repeated (and improved) Sleeping Beauty problem 2018-07-10T22:32:56.191Z
Probability is fake, frequency is real 2018-07-10T22:32:29.692Z
The Mad Scientist Decision Problem 2017-11-29T11:41:33.640Z
Extensive and Reflexive Personhood Definition 2017-09-29T21:50:35.324Z
Call for cognitive science in AI safety 2017-09-29T20:35:16.738Z
The Virtue of Numbering ALL your Equations 2017-09-28T18:41:35.631Z
Suggested solution to The Naturalized Induction Problem 2016-12-24T16:03:03.000Z
Suggested solution to The Naturalized Induction Problem 2016-12-24T15:55:16.000Z


Comment by Linda Linsefors on New Tool: the Residual Stream Viewer · 2023-10-26T16:26:29.666Z · LW · GW

It looks like this to me:

Where's the colourful text?
Is it broken or am I doing something wrong?

Comment by Linda Linsefors on Projects I would like to see (possibly at AI Safety Camp) · 2023-10-11T17:30:12.726Z · LW · GW

Potentially we might be ok with it if the expected timescale is long enough (or the probability of it happening in a given timescale is low enough).

Agreed. I'd love for someone to investigate the possibility of slowing down substrate-convergence enough to be basically solved.

If that's true then that is a super important finding! And also an important thing to communicate to people! I hear a lot of people who say the opposite and that we need lots of competing AIs.

Hm, to me this conclusion seem fairly obvious. I don't know how to communicate it though, since I don't know what the crux is. I'd be up for participating in a public debate about this, if you can find me an opponent. Although, not until after AISC research lead applications are over, and I got some time to recover. So maybe late November at the earliest. 

Comment by Linda Linsefors on [deleted post] 2023-10-09T23:36:49.197Z

I've made an edit to remove this part.

Comment by Linda Linsefors on [deleted post] 2023-10-09T23:34:11.383Z

Inner alignment asks the question - “Is the model trying to do what humans want it to do?”

This seems inaccurate to me. An AI can be inner aligned and still not aligned if we solve inner aliment but mess up outer alignment. 

This text also shows up in the outer alignment tag: Outer Alignment - LessWrong 

Comment by Linda Linsefors on Projects I would like to see (possibly at AI Safety Camp) · 2023-09-29T10:29:45.491Z · LW · GW
  • An approach could be to say under what conditions natural selection will and will not sneak in. 


  • Natural selection requires variation. Information theory tells us that all information is subject to noise and therefore variation across time. However, we can reduce error rates to arbitrarily low probabilities using coding schemes. Essentially this means that it is possible to propagate information across finite timescales with arbitrary precision. If there is no variation then there is no natural selection. 

Yes! The big question to me is if we can reduced error rates enough. And "error rates" here is not just hardware signal error, but also randomness that comes from interacting with the environment.

  • In abstract terms, evolutionary dynamics require either a smooth adaptive landscape such that incremental changes drive organisms towards adaptive peaks and/or unlikely leaps away from local optima into attraction basins of other optima. In principle AI systems could exist that stay in safe local optima and/or have very low probabilities of jumps to unsafe attraction basins. 

It has to be smooth relative to the jumps the jumps that can be achieved what ever is generating the variation. Natural mutation don't typically do large jumps. But if you have a smal change in motivation for an intelligent system, this may cause a large shift in behaviour. 

  • I believe that natural selection requires a population of "agents" competing for resources. If we only had a single AI system then there is no competition and no immediate adaptive pressure.

I though so too to start with. I still don't know what is the right conclusion, but I think that substrate-needs convergence it at least still a risk even with a singleton. Something that is smart enough to be a general intelligence, is probably complex enough to have internal parts that operate semi independently, and therefore these parts can compete with each other. 

I think the singleton scenario is the most interesting, since I think that if we have several competing AI's, then we are just super doomed. 

And by singleton I don't necessarily mean a single entity. It could also be a single alliance. The boundaries between group and individual is might not be as clear with AIs as with humans. 

  • Other dynamics will be at play which may drown out natural selection. There may be dynamics that occur at much faster timescales that this kind of natural selection, such that adaptive pressure towards resource accumulation cannot get a foothold. 

This will probably be correct for a time. But will it be true forever? One of the possible end goals for Alignment research is to build the aligned super intelligence that saves us all. If substrate convergence is true, then this end goal is of the table. Because even if we reach this goal, it will inevitable start to either value drift towards self replication, or get eaten from the inside by parts that has mutated towards self replication (AI cancer), or something like that.

  • Other dynamics may be at play that can act against natural selection. We see existence-proofs of this in immune responses against tumours and cancers. Although these don't work perfectly in the biological world, perhaps an advanced AI could build a type of immune system that effectively prevents individual parts from undergoing runaway self-replication. 

Cancer is an excellent analogy. Humans defeat it in a few ways that works together

  1. We have evolved to have cells that mostly don't defect
  2. We have an evolved immune system that attracts cancer when it does happen
  3. We have developed technology to help us find and fight cancer when it happens
  4. When someone gets cancer anyway and it can't be defeated, only they die, it don't spread to other individuals. 

Point 4 is very important. If there is only one agent, this agent needs perfect cancer fighting ability to avoid being eaten by natural selection. The big question to me is: Is this possible?

If you on the other hand have several agents, they you defiantly don't escape natural selection, because these entities will compete with each other. 


Comment by Linda Linsefors on Rationality: From AI to Zombies · 2023-09-27T21:18:03.206Z · LW · GW

I got into AI Safety. My interest in AI Safety lured me to a CFAR workshop, since it was a joint event with MIRI. I came for the Agent Foundations research, but the CFAR turned out just as valuable. It helped me start to integrate my intuitions with my reasoning, though IDC and other methods. I'm still in AI Safety, mostly organising, but also doing some thinking, and still learning. 

My resume lists all the major things I've been doing. Not the most interesting format, but I'm probably not going to write anything better anytime soon.
Resume - Linda Linsefors - Google Docs

Comment by Linda Linsefors on Steering GPT-2-XL by adding an activation vector · 2023-09-26T10:41:12.492Z · LW · GW

We don't know why the +2000 vector works but the +100 vector doesn't. 

My guess is it's because in the +100 case the vectors are very similar, causing their difference to be something un-natural.

"I talk about weddings constantly  "  and  "I do not talk about weddings constantly" are technically opposites. But if you imagine someone saying this, you notice that their neural language meaning is almost identical. 

What sort of person says  "I do not talk about weddings constantly"? That sounds to me like someone who talks about weddings almost constantly. Why else would they feel the need to say that?

Comment by Linda Linsefors on Steering GPT-2-XL by adding an activation vector · 2023-09-26T10:16:23.600Z · LW · GW

To steer a forward pass with the "wedding" vector, we start running an ordinary GPT-2-XL forward pass on the prompt "I love dogs" until layer 6. Right before layer 6 begins, we now add in the cached residual stream vectors from before:

I have a question about the image above this text.

Why do you add the embedding from the [<endofotext> -> "The"] stream? This part has no information about wedding.

Comment by Linda Linsefors on AI presidents discuss AI alignment agendas · 2023-09-26T09:45:54.003Z · LW · GW

I had a bit of trouble hearing the difference in voice between Trump and Biden, at the start. I solved this by actually imagining the presidents. Not visually, since I'm not a visual person, just loading up the general gestalt of their voices and typical way of speaking into my working memory. 

Another way to put it: When I asked my self "which if the voices I heard so far is this" I sometimes could not tell. But when I asked my self "who is this among Obama, Trump and Biden" it was always clear.

Comment by Linda Linsefors on Meta Questions about Metaphilosophy · 2023-09-25T19:09:59.186Z · LW · GW

If you think it would be helpful, you are welcome to suggest a meta philpsophy topic for AI Safety Camp.

More info at (I'm typing on a phone, I'll add actuall link later if I remember too)

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-13T13:05:43.144Z · LW · GW

This is a good point. I was thinking in terms of legal vs informal, not in terms of written vs verbal. 

I agree that having something written down is basically always better. Both for clarity, as you say, and because peoples memories are not perfect. And it have the added bonus that if there is a conflict, you have something to refer back to.

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-13T12:52:04.439Z · LW · GW

Thanks for adding your perspective. 

If @Rob Bensinger does in fact cross-post Linda's comment, I request he cross-posts this, too.

I agree with this.

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-13T12:48:31.838Z · LW · GW

I'm glad you liked it. You have my permission to cross post.

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-10T14:07:39.816Z · LW · GW

Thanks for writing this post.

I've heard enough bad stuff about Nonlinear from before, that I was seriously concerned about them. But I did not know what to do. Especially since part of their bad reputation is about attacking critics, and I don't feel well positioned to take that fight.

I'm happy some of these accusations are now out in the open. If it's all wrong and Nonlinear is blame free, then this is their chance to clear their reputation. 

I can't say that I will withhold judgment until more evidence comes in, since I already made a preliminary judgment even before this post. But I can promise to be open to changing my mind. 

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-10T13:48:57.533Z · LW · GW

I have worked without legal contracts for people in EA I trust, and it has worked well.

Even if all the accusation of Nonlinear is true, I still have pretty high trust for people in EA or LW circles, such that I would probably agree to work with no formal contract again.

The reason I trust people in my ingroup is that if either of us screw over the other person, I expect the victim to tell their friends, which would ruin the reputation of the wrongdoer. For this reason both people have strong incentive to act in good faith. On top of that I'm wiling to take some risk to skip the paper work.

When I was a teenager I worked a bit under legally very sketch circumstances. They would send me to work in some warehouse for a few days, and draw up the contract for that work afterwards. Including me falsifying the date for my signature. This is not something I would have agreed to with a stranger, but the owner of my company was a friend of my parents, and I trusted my parents to slander them appropriately if they screwed me over. 

I think my point is that this is not something very uncommon, because doing everything by the book is so much overhead, and sometimes not worth it.

It think being able to leverage reputation based and/or ingroup based trust is immensely powerful, and not something we should give up on.

For this reason, I think the most serious sin committed by Nonlinear, is their alleged attempt of silencing critics. 
Update to clarify: This is based on the fact that people have been scared of criticising Nonlinear. Not based on any specific wording of any specific message.
Update: On reflection, I'm not sure if this is the worst part (if all accusations are true). But it's pretty high on the list.

I don't think making sure that no EA every give paid work to another EA, with out a formal contract, will help much. The most vulnerable people are those new to the movement, which are exactly the people who will not know what the EA norms are anyway. An abusive org can still recruit people with out contracts and just tell them this is normal. 

I think a better defence mechanism is to track who is trust worthy or not, by making sure information like this comes out. And it's not like having a formal contract prevents all kinds of abuse.

Update based on responses to this comment: I do think having a written agreement, even just an informal expression of intentions, is almost always strictly superior to not having anything written down. When writing this I comment I was thinking in terms of formal contract vs informal agreement, which is not the same as verbal vs written. 

Comment by Linda Linsefors on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T12:01:55.754Z · LW · GW

But I think orgs are more likely to be well-known to grant-makers on average given that they tend to have a higher research output,

I think your getting the causality backwards. You need money first, before there is an org. Unless you count informal multi people collaborations as orgs. 

I think people how are more well-known to grant-makers are more likely to start orgs. Where as people who are less known are more likely to get funding at all, if they aim for a smaller garant, i.e. as an independent researcher. 

Comment by Linda Linsefors on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T11:57:02.075Z · LW · GW

Counter point. After the FTX collapse, OpenPhil said publicly (some EA Forum post)  that they where raising their bar for funding. I.e. there are things that would have been funded before that would now not be funded. The stated reason for this is that there are generally less money around, in total. To me this sounds like the thing you would do if money is the limitation. 

I don't know why OpenPhil don't spend more. Maybe they have long timelines and also don't expect any more big donors any time soon? And this is why they want to spend carefully?

Comment by Linda Linsefors on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T11:51:56.460Z · LW · GW

From what I can tell, the field have been funding constrained since the FTX collapse.

What I think happened: 
FTX had lots of money and a low bar for funding, which meant they spread a lot of money around. This meant that more project got started, and probably even more people got generally encouraged to join. Probably some project got funded that should not have been, but probably also some really good projects got started that did not get money before because not clearing the bar before due to not having the right connections, or just bad att writing grant proposals. In short FTX money and the promise of FTX money made the field grow quickly. Also there where where also some normal field growth. AIS has been growing steadily for a while. 

Then FTX imploded. There where lots of chaos. Grants where promised but never paid out. Some orgs don't what to spend the money they did get from FTX because of risk of clawback risks. Other grant makers cover some of this but not all of this. It's still unclear what the new funding situation is.

Some months later, SFF, FTX and Nonlinear Network have their various grant rounds. Each of them get overwhelmed with applications. I think this is mainly from the FTX induced growth spurt, but also partly orgs still trying to recover from loss of FTX money, and just regular growth. Either way, the outcome of these grant rounds make it clear that the funding situation has changed. The bar for getting funding is higher than before. 

Comment by Linda Linsefors on Demystifying Born's rule · 2023-06-16T13:57:52.323Z · LW · GW

Not with out spending more time than I want on this. Sorry.

Comment by Linda Linsefors on Demystifying Born's rule · 2023-06-16T13:02:32.461Z · LW · GW

I admit that I did not word that very well. Honestly don't know how to concisely express how much Copenhagen interpretation makes no sese at all, not even as an interpretation. 

Comment by Linda Linsefors on Why I'm Not (Yet) A Full-Time Technical Alignment Researcher · 2023-06-15T11:10:16.069Z · LW · GW

I would expect LTFF to still to upskilling-grants, but I also expect that the bar is higher than it used to be. But this is just me making guesses. 

Comment by Linda Linsefors on Why I'm Not (Yet) A Full-Time Technical Alignment Researcher · 2023-06-14T21:00:23.774Z · LW · GW

I really wish you get the funding you need to at least take some extended leave from your day-job, to see what you can do when you get to sleep at what ever time you want, and also devote your mind to AI Safety full time. 

Like others have said, this sounds like something you might potentially get a grant for. 

However, you should know that unfortunately money is tight for most project since the FTX crash. I just got another reminder of this, seeing this post

I still think you should apply! It's worth a try. But don't take it too hard if you get rejected.

Comment by Linda Linsefors on On the possibility of impossibility of AGI Long-Term Safety · 2023-06-14T14:24:43.262Z · LW · GW

I think this is an interesting post, and I think Forest is at least pointing to an additional AI risk, even if I'm not yet convinced it's not solvable.

However this post has one massive weakness, which it shares with "No people as pets". 

You are not addressing the possibility or impossibility of alignment. Your argument is based on the fact that we can't provide any instrumental value to the AI. This is just a re-phrasing of the classical alignment problem. I.e. if we don't specifically program the AI to care about us and our needs, it won't. 

I think if you are writing for the LW crowd, it will be much more well received if you directly adress the possibility or impossibility of building an aligned AI.

> Self-agency is defined here as having no limits on the decisions the system makes (and thus the learning it undergoes).

I find this to be an odd definition. Do you mean "no limits" as in the system is literally stochastic and every action has >0 probability? Probably not, because that would be a stupid design. So what do you mean? Probably that we humans can't predict it's action to rule out any specific action. But there is no strong reason we have to build an AI like that. 

It would be very useful if you could clarify this definition, as to clarify what class of AI you think is impossible to make safe. Otherwise we risk just talking past each other. 

Most of the post seems to discuss an ecosystem of competing silicon based life forms. I don't think anyone believe that setup will be safe for us. This is not where the interesting disagreement lies.

Comment by Linda Linsefors on Demystifying Born's rule · 2023-06-14T13:11:44.816Z · LW · GW

On the other hand, pragmatically speaking, pilot wave theory does give the same predictions as other QM interpretation. So it's probably fine use this interpretation if it simplifies other things. 

Comment by Linda Linsefors on Demystifying Born's rule · 2023-06-14T13:02:34.416Z · LW · GW

I have a background in physics, and I don't like pilot wave theory, because the particle configuration is completely epi-phenomenal. And by the way, I also don't like Copenhagen interpretation, because it's not even a theory. 

Also, last I heard, they had not figured out how to multiple particles, let alone field theory. But that was almost a decade ago, so there has probably been some progress. 

Regarding explaining the Born's rule. You have a point that many words leave something to be explained here. On the other hand there is no other alternative. There is no other choice that preserves probability over time. 

Comment by Linda Linsefors on DragonGod's Shortform · 2023-05-10T02:04:07.337Z · LW · GW

The boring technical answer is that any policy can be described as a utility maximiser given a contrived enough utility function.

The counter argument to that if the utility function is as complicated as the policy, then this is not a useful description. 

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2023-05-10T01:45:30.292Z · LW · GW

Todays thoughts: 

I suspect it's not possible to build autonomous aligned AIs (low confidence). The best we can do is some type of hybrid humans-in-the-loop system. Such a system will be powerful enough to eventually give us everything we want, but it will also be much slower and intellectually inferior to what is possible with out humans-in-the-loop. I.e. the alignment tax will be enormous. The only way the safe system can compete, is by not building the unsafe system. 

Therefore we need AI Governance. Fortunately, political action is getting a lot of attention right now, and the general public seems to be positively inclined to more cautious AI development. 

After getting an immediate stop/paus on larger models, I think next step might be to use current AI to cure aging. I don't want to miss the singularity because I died first, and I think I'm not the only one who feels this way. It's much easier to be patient and cautious in a world where aging is a solved problem. 

We probably need a strict ban on building autonomous superintelligent AI until we reached technological maturity. It's probably not a great idea to build them after that either, but they will probably not pose the same risk any longer. This last claim is not at all obvious. The hardest attack vector to defend against would be manipulation. I think reaching technological maturity will make us able to defend against any military/hard-power attack. This includes for example having our own nano-bot defence system, to defend against hostile nanobots. Manipulation is harder, but I think there are ways to solve that, with enough time to set up our defences.

An important crux for what there end goal is, including if there is some stable end where we're out of the danger, is to what extent technological maturity also leads to a stable cultural/political situation, or if that keeps evolving in ever new directions. 

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2023-04-21T15:27:46.460Z · LW · GW

Recently an AI safety researcher complained to me about some interaction they had with an AI Safety communicator. Very stylized, there interaction went something like this:

(X is some fact or topic related to AI Safety

Communicator: We don't know anything about X and there is currently no research on X.

Researcher: Actually, I'm working on X, and I do know some things about X.

Communicator: We don't know anything about X and there is currently no research on X.


I notice that I semi-frequently hear communicators saying things like the thing above. I think what they mean is that our our understanding of X is far from the understanding that is needed, and the amount of researchers working on this is much fewer than what would be needed, and this get rounded off to we don't know anything and no one is doing anything about it. If this is what is going on then I think this is bad. 

I think that is some cases when someone says "We don't know anything about X and there is currently no research on X." they probably literally mean it. There are some people who think that approximately no-one working on AI Safety is doing real AI Safety researchers. But I also think that most people who are saying "We don't know anything about X and there is currently no research on X." are doing some mixture of rounding off, some sort of unreflexively imitation learning, i.e. picking up the sentence structure from others, especially from high status people. 

I think using a language that hides the existence of the research that does exist is bad. Primarily because it's misinformative. Do we want all new researchers to start from scratch? Because that is what happens if you tell them there is no pre-existing research and they believe you. 

I also don't think this exaggeration will help with recruitment. Why do you think people would prefer to join a completely empty research field instead of a small one? From a personal success perspective (where success can mean either impact or career success) a small research field is great, lots if low-hanging fruit around. But a completely untrodden research direction is terrible, you will probably just get lost, not get anything done, and even if you fid something, there's nowhere to publish it.

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2023-04-20T10:23:46.899Z · LW · GW

Recording though in progress...

I notice that I don't expect FOOM like RSI, because I don't expect we'll get an mesa optimizer with coherent goals. It's not hard to give the outer optimiser (e.g. gradient decent) a coherent goal. For the outer optimiser to have a coherent goal is the default. But I don't expect that to translate to the inner optimiser. The inner optimiser will just have a bunch of heuristics and proxi-goals, and not be very coherent, just like humans. 

The outer optimiser can't FOOM, since it don't do planing, and don't have strategic self awareness. It's can only do some combination of hill climbing and random trial and error. If something is FOOMing it will be the inner optimiser, but I expect that one to be a mess.

I notice that this argument don't quite hold. More coherence is useful for RSI, but complete coherence is not necessary. 

I also notice that I expect AIs to make fragile plans, but on reflection, I expect them to gett better and better with this. By fragile I mean that the longer the plan is, the more likely it is to break. This is true for human too though. But we are self aware enough about this fact to mostly compensate, i.e. make plans that don't have too many complicated steps, even if the plan spans a long time.

Comment by Linda Linsefors on AI alignment researchers don't (seem to) stack · 2023-03-11T21:28:38.271Z · LW · GW

Like, as a crappy toy model, if every alignment-visionary's vision would ultimately succeed, but only after 30 years of study along their particular path, then no amount of new visionaries added will decrease the amount of time required from “30y since the first visionary started out”.


I think that a closer to true model is that most current research directions will lead approximately no-where but we don't know until someone goes and check. Under this model adding more researchers increases the probability that at least someone is working on fruitful research direction. And I don't think you (So8res) disagree, at least not completely?

I don't think we're doing something particularly wrong here. Rather, I'd say: the space to explore is extremely broad; humans are sparsely distributed in the space of intuitions they're able to draw upon; people who have an intuition they can follow towards plausible alignment-solutions are themselves pretty rare; most humans don't have the ability to make research progress without an intuition to guide them. Each time we find a new person with an intuition to guide them towards alignment solutions, it's likely to guide them in a whole new direction, because the space is so large. Hopefully at least one is onto something.

I do think that researchers stack, because there are lots of different directions that can and should be explored in parallel. So maybe the crux is to what fraction of people can do this? Most people I talk to do have research intrusions. I think it takes time and skill to cultivate one's intuition into an agenda that one can communicate to others, but just having enough intuition to guide one self is a much lower bar. However most people I talk to think they have to fit into someone else's idea of what AIS research look like in order to get paid. Unfortunately I think this is a correct belief for everyone without exceptional communication skills and/or connections. But I'm honestly uncertain about this, since I don't have a good understanding of the current funding landscape.

A side from money there are also imposter-syndrom type effects going on. A lot of people I talk to don't feel like they are allowed to have their own research direction, for vague social reasons. Some things that I have noticed sometimes helps:

  • Telling them "Go for it!", and similar things. Repletion helps.
  • Talking about how young AIS is as a field, and the implications of this, including the fact that their intrusions about the importance of expertise is probably wrong when applied to AIS.
  • Handing over a post-it note with the text "Hero Licence".
Comment by Linda Linsefors on You Don't Exist, Duncan · 2023-03-05T13:57:36.475Z · LW · GW

I believe you that in some parts of Europe this is happening, witch is good. 

Comment by Linda Linsefors on Focus on the places where you feel shocked everyone's dropping the ball · 2023-02-08T00:27:39.025Z · LW · GW

I "feel shocked that everyone's dropping the ball".


Maybe not everyone
The Productivity Fund (
Although this project has been "Coming soon!" for several months now. If you want to help with the non-dropping of this ball, you could check in with them to see if they could use some help.

Comment by Linda Linsefors on Focus on the places where you feel shocked everyone's dropping the ball · 2023-02-08T00:22:58.763Z · LW · GW

Funding is not truly abundant. 

  • There are people who have above zero chance of helping that don't get upskilling grants or research grants. 
  • There are several AI Safety orgs that are for profit in order to get investment money, and/or to be self sufficient, because given their particular network, it was easier to get money that way (I don't know the details of their reasoning).
  • I would be more efficient if I had some more money and did not need to worry about budgeting in my personal life. 

I don't know to what extent this is due to the money not existing, or it's due to grant evaluation is hard, and there are some reason to not give out money to easily. 

Comment by Linda Linsefors on Focus on the places where you feel shocked everyone's dropping the ball · 2023-02-08T00:13:15.163Z · LW · GW

Is this... not what's happening?

No by default.

I did not have this mindset right away. When I was new to AI Safety I though it would require much more experience before I was qualified to question the consensus, because that is the normal situation, in all the old sciences. I knew AI Safety was young, but I did not understand the implications at first. I needed someone to prompt me to get started. 

Because I've run various events and co-founded AI Safety Support, I've talked to loooots of AI Safety newbies. Most people are too causes when it comes to believing themselves and too ready to follow authorities. It's usually only takes a short conversation pointing out how incredibly young AI Safety is, and what that means, but many people do need this one push.

Comment by Linda Linsefors on You Don't Exist, Duncan · 2023-02-06T21:22:07.215Z · LW · GW

Yes, that makes sense. Having a bucked is defiantly helpful for finding advise. 

Comment by Linda Linsefors on You Don't Exist, Duncan · 2023-02-05T12:36:59.316Z · LW · GW

I can't answer for Duncan, but I have had similar enough experiences that I will answer for my self. When I notice that someone is chronically typical minding (not just typical minding as a prior, but shows signs that they are unable to even to consider that others might be different in unexpected ways), then I leave as fast as I can, because such people are dangerous. Such people will violate my boundaries until I have a full melt down. They will do so in the full belief that they are helpful, and override anything I tell them with their own prior convictions. 

I tired to get over the feeling of discomfort when I felt misunderstood, and it did not work. Because it's not just a reminder that the wold isn't perfect (something I can update on and get over), but an active warning signal.

Learning to interpret this warning signal, and knowing when to walk away, has helped a lot.

Different people and communities are more or less compatible with my style of weird. Keeping track of this is very useful.  

Comment by Linda Linsefors on You Don't Exist, Duncan · 2023-02-05T12:25:53.317Z · LW · GW

I think this comment is pointing in the right direction. But I disagree with

E.g. today we have buckets like "ADHD" and "autistic" with some draft APIs attached

There are buckets, but I don't know what the draft APIs would be. Unless you count "finding your own tribe and stay away from the neurotypicals" as an API.

If you know something I don't let me know!

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2023-02-02T10:22:23.355Z · LW · GW

Yes, that is a thing you can do with decision transforms too. I was referring to variant of the decision transformer (see link in original short form) where the AI samples the reward it's aiming for. 

Comment by Linda Linsefors on AGI safety field building projects I’d like to see · 2023-01-22T15:20:07.275Z · LW · GW

I think having something like an AI Safety Global would be very high impact for several reasons. 

  1. Redirecting people who are only interested in AI Safety from EAG/EAGx to the conference they actually want to go to. This would be better for them and for EAG/EAGx. I think AIS has a place at EAG, but it's inefficient that lots people go there basically only to talk to other people interested in AIS. That's not a great experience either for them, or for the people who are there to talk about all the other EA cause areas.
  2. Creating any amount of additional common knowledge in the AI Safety sphere. AI Safety is begging big and diverse enough that different people are using different words in different ways, and using different unspoken assumptions. It's hard to make progress on top of the established consensus when there is no established consensus. I defiantly don't think (and don't want) all AIS researchers to start agreeing on everything. But just some common knowledge of what other researches are doing would help a lot. I think that a yearly conference where each major research group gives an official presentation of what they are doing and their latest results, would help a lot. 
  3. Networking. 

I don't think that such a conference should double as a peer-review journal, the way many ML and CS conferences do. But I'm not very attached to this opinion. 

I think making it not CEA branded is the right choice. I think it's healthier for AIS to be it's own thing, not a sub community of EA, even though there will always be an overlap in community membership.

What's your probability that you'll make this happen?

I'm asking because if you don't do this, I will try to convince someone else to do it. I'm not the right person to organise this my self. I'm good at smaller, less formal events. My style would not fit with what I think this conference should be. I think the EAG team would do a good job at this though. But if you don't do it someone else should. I also think the team behind the Human-aligned AI Summer School would do a good job at this, for example.

I responded here instead of over email, since I think there is a value in having this conversation in public. But feel free to email me if you prefer. 

Comment by Linda Linsefors on AI Safety Camp, Virtual Edition 2023 · 2023-01-09T11:17:53.174Z · LW · GW

There is no study material since this is not a course. If you are accepted to one of the project teams they you will work on that project. 

You can read about the previous research outputs here: Research Outputs – AI Safety Camp

The most famous research to come out of AISC is the coin-run experiment.
(95) We Were Right! Real Inner Misalignment - YouTube
[2105.14111] Goal Misgeneralization in Deep Reinforcement Learning (

But the projects are different each year, so the best way to get an idea for what it's like is just to read the project descriptions. 

Comment by Linda Linsefors on AI Safety Camp, Virtual Edition 2023 · 2023-01-09T11:04:33.863Z · LW · GW

We don't have any rule against joining more than one project, but you'd have to convince us that you have time for it. As long as you don't have any other commitments it should be fine. But you would also have to be accepted to both project separately, since each project lead make the final decision as to who they want to accept. 

I hope this answers your question Mateusz.

Comment by Linda Linsefors on AI Safety Camp, Virtual Edition 2023 · 2023-01-07T18:44:11.258Z · LW · GW

Thanks for letting us know. I've fixed this now.

Comment by Linda Linsefors on AI Safety Camp, Virtual Edition 2023 · 2023-01-07T16:21:35.310Z · LW · GW

I recommend applying to all projects you are interested in. 

I don't remember if we made any official decision in regards to officially joining more than one team. I've posted the question to the other organisers. But either way, we do encourage teams to help each other out. 

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2023-01-07T12:56:51.691Z · LW · GW

Second reply. And this time I actually read the link.
I'm not suppressed by that result.

My original comment was a reaction to claims of the type [the best way to solve almost any task is to develop general intelligence, therefore there is a strong selection pressure to become generally intelligent]. I think this is wrong, but I have not yet figured out exactly what the correct view is. 

But to use an analogy, it's something like this: In the example you gave, the AI get's better at the sub tasks by learning on a more general training set. It seems like general capabilities was useful. But consider that we just trained on even more data for a singel sub task, then wouldn't it develop general capabilities, since we just noticed that general capabilities was useful for that sub task. I was planing to say "no" but I notice that I do expect some transfer learning. I.e. if you train on just one of the dataset, I expect it to be bad at the other ones, but I also expect it to learn them quicker than without any pre-training. 

I seem to expect that AI will develop general capabilities when training on rich enough data, i.e. almost any real world data. LLM is a central example of this. 

I think my disagreement with at least my self from some years ago and probably some other people too (but I've been away a bit form the discourse so I'm not sure), is that I don't expect as much agentic long term planing as I used to expect. 

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2023-01-06T19:39:39.030Z · LW · GW

I agree that eventually, at some level of trying to solve enough different types of tasks, GI will be efficient, in terms of how much machinery you need, but it will never be able to compete on speed. 

Also, it's an open question what is "enough different types of tasks". Obviously, for a sufficient broad class of problems GI will be more efficient (in the sense clarified above). Equally obviously, for a sufficient narrow class of problems narrow capabilities will be more efficient. 

Humans have GI to some extent, but we mostly don't use it. This is interesting. This means that a typical human environment is complex enough so that it's worth carrying around the hardware for GI. But even though we have it, it is evolutionary better to fall back at habits, or imitation, or instinkt, for most situations.

Looking back to exactly what I wrote, I said there will not be any selection pressure for GI as long as other options are available. I'm not super confident in this. But if I'm going to defend it here anyway by pointing out that "as long as other options are available", is doing a lot of the work here. Some problems are only solvable by noticing deep patterns in reality, and in this case a sufficiently deep NN with sufficient training will learn this, and that is GI.

Comment by Linda Linsefors on Air-gapping evaluation and support · 2022-12-28T11:30:56.183Z · LW · GW

I think the EA and AI safety communities could benefit from more confidential support roles, like the CEA community health team

They are not air-gaped!

Comment by Linda Linsefors on Methodological Therapy: An Agenda For Tackling Research Bottlenecks · 2022-12-01T17:07:54.301Z · LW · GW

I think we agreement.

I think the confusion is because it is not clear form that section of the post if you are saying 
1)"you don't need to do all of these things" 
2) "you don't need to do any of these things".

Because I think 1 goes without saying, I assumed you were saying 2. Also 2 probably is true in rare cases, but this is not backed up by your examples.

But if 1 don't go without saying, then this means that a lot of "doing science" is cargo-culting? Which is sort of what you are saying when you talk about cached methodologies.

So why would smart, curious, truth-seeking individuals use cached methodologies? Do I do this?

Some self-reflection: I did some of this as a PhD student, because I was new, and it was a way to hit the ground running. So, I did some science using the method my supervisor told me to use, while simultaneously working to understand the reason behind this method. I did spend less time that I would have wanted to understand all the assumptions of the sub-sub field of physics I was working in, because of the pressure to keep publishing and because I got carried away by various fun math I could do if i just accepted these assumptions. After my PhD I felt that if I was going to stay in Physics, I wanted to take year or two for just learning, to actually understand Loop Quantum Gravit, and all the other competing theories, but that's not how academia works unfortunately, which is one of the reasons I left.

I think that the fundament of good Epistemic is to not have competing incentives.

Comment by Linda Linsefors on Methodological Therapy: An Agenda For Tackling Research Bottlenecks · 2022-11-29T20:57:35.747Z · LW · GW

In particular, four research activities were often highlighted as difficult and costly (here in order of decreasing frequency of mention):

  • Running experiments
  • Formalizing intuitions
  • Unifying disparate insights into a coherent frame
  • Proving theorems

I don't know what your first reaction to this list is, but for us, it was something like: "Oh, none of these activities seems strictly speaking necessary in knowledge-production." Indeed, a quick look at history presents us with cases where each of those activities was bypassed:

What these examples highlight is the classical failure when searching for the need of customers: to anchor too much on what people ask for explicitly, instead of what they actually need. 


I disagree that this conclusion follows from the examples. Every example you list uses at least one of the methods in your list. So, this might as well be used as evidence for why this list of methods are important. 

In addition, several of the listed examples benefited from division of labour. This is a common practice in Physics. Not everyone does experiments. Some people instead specialise in the other steps of science, such as 

  • Formalizing intuitions
  • Unifying disparate insights into a coherent frame
  • Proving theorems

This is very different from concluding that experiments are not necessary.


Comment by Linda Linsefors on Where are the red lines for AI? · 2022-11-26T20:57:54.504Z · LW · GW

I mostly agree with this post. 
That said, here's some points I don't agree with, and some extra nit-picking because Karl asked me for feedback.

The points above indicate that the line between “harmless” and “dangerous” must be somewhere below the traditional threshold of “at least human problem-solving capabilities in most domains”.

I don't think we know even this. I can imagine an AI that is successfully trained to imitate human behaviour, such that it is it has human problem-solving capabilities in most domains, but which does not pose an existential threat, because it just keeps behaving like a human. This could happen because this AI is not an optimiser but a "predict what a skilled human would do next and then do that" machine.

It is also possible that no such AI would be stable, because it would notice that it is not human, which will somehow cause it to go of rail and start self-improve, or something. At the moment I don't think we have good evidence either way. 

But while it is often difficult to get people to agree on any kind of policy, there are already many things which are not explicitly forbidden, but most people don’t do anyway,

The list of links to stupid things did anyway don't exactly illustrate your point. But there is a possible argument here regarding the fact that the number of people who have access to teraflops of compute is a much smaller number than those who have access to aquarium fluid. 

If we managed to create a widespread common-sense understanding of what AI we should not build. How long do you think it will take for some idiot to do it anyway, after it becomes possible?

(think for example of social media algorithms pushing extremist views, amplifying divisiveness and hatred, and increasing the likelihood of nationalist governments and dictatorships, which in turn increases the risk of wars).

I don't think the algorithms have much to do with this. I know this is a claim that keeps circulating, but I don't know what the evidence is. Clearly social media have political influence, but to me this seems to have more to do with the massively increased communication connectiveness, than anything about the specific algorithms.

This will require a lot more research. But there are at least some properties of an AI that could be relevant in this context:

I think this is a good list. On first read I wanted to add agency/agentic-ness/optimiser-similarity but thinking some more I think this should not be included. The reason not to put it on the list is that it's because of the combination:

  1. agency is vague hard to define concept. 
  2. The relevant aspects of agency (from the perspective of safety) are covered by strategic awareness and stability. So probably don't add it to the list. 

However, you might want to add the similar concept "consequentialist reasoning ability". Although it can be argued that this is just the same as "world model".

Comment by Linda Linsefors on Why don't we have self driving cars yet? · 2022-11-14T12:54:23.828Z · LW · GW

I don't think that self driving cars is AGI complete problem, but I also have not though a lot about this question. I would appreciate to hear your reasoning why you think this is the case. Or maybe I misunderstood you? In which case I'd appreciate a clarification.