Posts

AISC9 has ended and there will be an AISC10 2024-04-29T10:53:18.812Z
AI Safety Camp final presentations 2024-03-29T14:27:43.503Z
Virtual AI Safety Unconference 2024 2024-03-13T13:54:03.229Z
Some costs of superposition 2024-03-03T16:08:20.674Z
This might be the last AI Safety Camp 2024-01-24T09:33:29.438Z
Funding case: AI Safety Camp 2023-12-12T09:08:18.911Z
AI Safety Camp 2024 2023-11-18T10:37:02.183Z
Projects I would like to see (possibly at AI Safety Camp) 2023-09-27T21:27:29.539Z
Apply to lead a project during the next virtual AI Safety Camp 2023-09-13T13:29:09.198Z
How teams went about their research at AI Safety Camp edition 8 2023-09-09T16:34:05.801Z
Virtual AI Safety Unconference (VAISU) 2023-06-13T09:56:22.542Z
AISC end of program presentations 2023-06-06T15:45:04.873Z
Project Idea: Lots of Cause-area-specific Online Unconferences 2023-02-06T11:05:27.468Z
AI Safety Camp, Virtual Edition 2023 2023-01-06T11:09:07.302Z
Why don't we have self driving cars yet? 2022-11-14T12:19:09.808Z
How I think about alignment 2022-08-13T10:01:01.096Z
Infohazard Discussion with Anders Sandberg 2021-03-30T10:12:45.901Z
AI Safety Beginners Meetup (Pacific Time) 2021-03-04T01:44:33.856Z
AI Safety Beginners Meetup (European Time) 2021-02-20T13:20:42.748Z
AISU 2021 2021-01-30T17:40:38.292Z
Online AI Safety Discussion Day 2020-10-08T12:11:56.934Z
AI Safety Discussion Day 2020-09-15T14:40:18.777Z
Online LessWrong Community Weekend 2020-08-31T23:35:11.670Z
Online LessWrong Community Weekend, September 11th-13th 2020-08-01T14:55:38.986Z
AI Safety Discussion Days 2020-05-27T16:54:47.875Z
Announcing Web-TAISU, May 13-17 2020-04-04T11:48:14.128Z
Requesting examples of successful remote research collaborations, and information on what made it work? 2020-03-31T23:31:23.249Z
Coronavirus Tech Handbook 2020-03-21T23:27:48.134Z
[Meta] Do you want AIS Webinars? 2020-03-21T16:01:02.814Z
TAISU - Technical AI Safety Unconference 2020-01-29T13:31:36.431Z
Linda Linsefors's Shortform 2020-01-24T13:08:26.059Z
1st Athena Rationality Workshop - Retrospective 2019-07-17T16:51:36.754Z
Learning-by-doing AI Safety Research workshop 2019-05-24T09:42:49.996Z
TAISU - Technical AI Safety Unconference 2019-05-21T18:34:34.051Z
The Athena Rationality Workshop - June 7th-10th at EA Hotel 2019-05-11T01:01:01.973Z
The Athena Rationality Workshop - June 7th-10th at EA Hotel 2019-05-10T22:08:03.600Z
The Game Theory of Blackmail 2019-03-22T17:44:36.545Z
Optimization Regularization through Time Penalty 2019-01-01T13:05:33.131Z
Generalized Kelly betting 2018-07-19T01:38:21.311Z
Non-resolve as Resolve 2018-07-10T23:31:15.932Z
Repeated (and improved) Sleeping Beauty problem 2018-07-10T22:32:56.191Z
Probability is fake, frequency is real 2018-07-10T22:32:29.692Z
The Mad Scientist Decision Problem 2017-11-29T11:41:33.640Z
Extensive and Reflexive Personhood Definition 2017-09-29T21:50:35.324Z
Call for cognitive science in AI safety 2017-09-29T20:35:16.738Z
The Virtue of Numbering ALL your Equations 2017-09-28T18:41:35.631Z
Suggested solution to The Naturalized Induction Problem 2016-12-24T16:03:03.000Z
Suggested solution to The Naturalized Induction Problem 2016-12-24T15:55:16.000Z

Comments

Comment by Linda Linsefors on LessWrong Community Weekend 2024 [Applications Open] · 2024-05-02T21:59:11.750Z · LW · GW

The EA SummerCamp takes place the next weekend


I've not been to any of these, but would like to. Is there any info up yet for this years EA SummerCamp?

Comment by Linda Linsefors on AI Safety Camp final presentations · 2024-05-01T18:02:01.547Z · LW · GW

Yes, thanks for asking

Comment by Linda Linsefors on AISC9 has ended and there will be an AISC10 · 2024-04-30T09:24:34.313Z · LW · GW

Second part being "there will be an AISC10"?

Very sure. 

As long as me and Remmelt are still alive and healthy a couple of months from now, then we're doing it.

Remmelt have organised 8 previous AISCs, I've been part of 3 of those. We know what we are doing. We know we can rely on each other, and we want to do this.

We just needed to make sure we have money to live and eat and such, before we could commit to running a next camp. But we have received the money now, so that's all good. Manifund have sent us the money, it's in our bank accounts. 

I'll bet anyone who like, that there will be an AISC10 at 1:10 odds in your favour. I'm much more confident that that, but if you give me worse odds, then I don't think can be bothered about it.

Comment by Linda Linsefors on A Review of In-Context Learning Hypotheses for Automated AI Alignment Research · 2024-04-19T11:08:34.427Z · LW · GW

I disagree. In verbal space MARS and MATS are very distinct, and they look different enough to me.

However, if you want to complain, you should talk to the organisers, not one of the participants.

Here is their website: MARS — Cambridge AI Safety Hub

(I'm not involved in MARS in any way.)
 

Comment by Linda Linsefors on AI Safety Camp final presentations · 2024-04-19T10:00:09.715Z · LW · GW

I've now updated the event information to include summaries/abstracts for the projects/talks. Some of these are still under construction.

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2024-03-31T01:39:41.660Z · LW · GW

Ok, you're right that this is a very morally clear story. My bad for not knowing what's typical tabloid storry.

Missing kid = bad,
seems like a good lesson for AI to learn.

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2024-03-30T17:08:07.344Z · LW · GW

I don't read much sensationalist tabloid, but my impression is that the things that get a lot of attention in the press, is things people can reasonable take either side of.

Scott Alexander writes about how everyone agrees that factory framing is terrible, but exactly because this overwhelming agreement, it get's no attention. Which is why PETA does outrageous things to get attention.

The Toxoplasma Of Rage | Slate Star Codex

There need to be two sides to an issue, or else no-one gets ingroup loyalty points for taking one side or the other. 

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2024-03-29T01:16:08.666Z · LW · GW

Their more human-in-the-loop stuff seems neat though.

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2024-03-29T01:14:25.725Z · LW · GW

I found this on their website

Soon, interacting with AI agents will be a part of daily life, presenting enormous regulatory and compliance challenges alongside incredible opportunities.

Norm Ai agents also work alongside other AI agents who have been entrusted to automate business processes. Here, the role of the Norm Ai agent is to automatically ensure that actions other AI agents take are in compliance with laws.

I'm not sure if this is worrying, because I don't think AI overseeing AI is a good solution. Or it's actually good, because, again, not a good solution, which might lead to some early warnings?

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2024-03-29T01:04:42.768Z · LW · GW

Sensationalist tabloid news stories and other outrage porn are not the opposite. These are actually more of the same. More edge cases. Anything that is divisive have the problem I'm talking about. 

Fiction is a better choice.

Or even just completely ordinary every-day human behaviour. Most humans are mostly nice most of the time.

We might have to start with the very basic, the stuff we don't even notice, because it's too obvious. Things no-one would think of writing down.

Comment by Linda Linsefors on Some costs of superposition · 2024-03-27T23:23:21.319Z · LW · GW

The math in the post is super hand-wavey, so I don't expect the result to be exactly correct. However in your example, l up to 100 should be ok, since there is no super position. 2.7 is almost 2 orders of magnitude off, which is not great.

Looking into what is going on: I'm basing my results on the Johnson–Lindenstrauss lemma, which gives an upper bound on the interference. In the post I'm assuming that the actual interference is order of magnitude the same as the this upper bound. This assumption is clearly fails in your example since the interference between features is zero, and nothing is the same order of magnitude as zero.

I might try to do the math more carefully, unless someone else gets there first. No promises though. 

I expect that my qualitative claims will still hold. This is based on more than the math, but math seemed easier to write down. I think it would be worth doing the math properly, both to confirm my claims, and it may be useful to have more more accurate quantitative formulas. I might do this if I got some spare time, but no promises.

my qualitative claims = my claims about what types of things the network is trading away when using super position

quantitative formulas = how much of these things are traded away for what amount of superposition.

 

Comment by Linda Linsefors on Linda Linsefors's Shortform · 2024-03-27T22:52:06.857Z · LW · GW

Recently someone either suggested to me (or maybe told me they or someone where going to do this?) that we should train AI on legal texts, to teach it human values. Ignoring the technical problem of how to do this, I'm pretty sure legal text are not the right training data. But at the time, I could not clearly put into words why. Todays SMBC explains this for me:

Saturday Morning Breakfast Cereal - Law (smbc-comics.com)

Law is not a good representation or explanation of most of what we care about, because it's not trying to be. Law is mainly focused on the contentious edge cases. 

Training an AI on trolly problems and other ethical dilemmas is even worse, for the same reason. 

Comment by Linda Linsefors on No Really, Why Aren't Rationalists Winning? · 2024-03-17T00:32:41.612Z · LW · GW

(Note: Said friend will be introducing himself on here and writing a sequence about his work later. When he does I will add the links here.)

 

Did you forget to add the links?

Comment by Linda Linsefors on What if Alignment is Not Enough? · 2024-03-11T17:46:56.549Z · LW · GW

I think point 5 is the main crux. 

Please click agree or disagree on this comment if you agree or disagree (cross or check mark), since this is useful guidance for what part of this people should prioritise when clarifying further. 

Comment by Linda Linsefors on Retrospective: PIBBSS Fellowship 2023 · 2024-03-08T17:52:59.560Z · LW · GW

Did you forget to provide links to research project outputs in the appendix? Or is there some other reason for this?

Comment by Linda Linsefors on Some costs of superposition · 2024-03-06T14:04:01.643Z · LW · GW

I think it's reasonable to think about what can be stored in a way that can be read of in a linear way (by the next layer), since that are the features that can be directly used in the next layer. 

storing them nonlinearly (in one of the host of ways it takes multiple nn layers to decode)

If it takes multiple nn layers to decode, then the nn need to unpack it before using it, and represent it as a linear readable feature later.

Comment by Linda Linsefors on Some costs of superposition · 2024-03-05T11:59:31.494Z · LW · GW

Good point. I need to think about this a bit more. Thanks

Just quickly writing up my though for now...

What I think is going on here is that Johnson–Lindenstrauss lemma gives a bound on how well you can do, so it's more like a worst case scenario. I.e. Johnson–Lindenstrauss lemma  gives you the worst case error for the best possible feature embedding.

I've assumed that the typical noise would be same order of magnitude as the worst case, but now I think I was wrong about this for large 

I'll have to think about what is more important of worst case and typical case. When adding up noise one should probably use worst typical case. But when calculating how many features to fit in, one should probably use worst case. 


 

Comment by Linda Linsefors on Some costs of superposition · 2024-03-03T19:57:38.169Z · LW · GW

Yes. Thanks for pointing this out. I changed notation and must have forgotten this one.

Comment by Linda Linsefors on Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality" · 2024-02-12T14:06:06.598Z · LW · GW

And... my guess in hindsight is that the "internal double crux" technique often led, in practice, to people confusing/overpowering less verbal parts of their mind with more-verbal reasoning, even in cases where the more-verbal reasoning was mistaken.


I'm confused about this. The way I remember it tough was very much explicitly against this, I.e: 

  • Be open to either outcome being right. 
  • Don't let the verbal part give the non-verbal part a dumb name. 
  • Make space for the non verbal part to express it self in it's natural modality which is often inner sim. 

For me IDC was very helpful to teach me how to listen to my non verbal parts. Reflecting on it, I never spent much time on the actual cruxing. When IDC-ing I mostly spend time on actually hearing both sides. And when all the evidence is out, the outcome is most often obvious.  

But it was the IDC lesson and the Focusing lesson that thought me these skills. Actually even more important than the skill was to teach me this possibility. 

For me probably the most important CFAR lesson was the noticing and "double-clicking" on intrusion. The one where Anna puts a glass of water on the edge of a table and/or writes expressions with the wrong number of parenthesises.

Do most people come away from a CFAR workshop listening less to their non verbal parts? 

I'm not surprised if people listning less to their non verbal parts happens at all. But I would be surprised if that's the general trend. 

On the surface Anna provides one datapoint, which is not much. But the fact that she brings up this datapoint, makes me suspect it's representative? Is it?

Comment by Linda Linsefors on Survey for alignment researchers! · 2024-02-09T13:17:51.008Z · LW · GW

I timed how long it took me to fill in the survey. It took 30 min. I could probably have done it in 15 min if I skipped the optional text questions. This is to be expected however. Every time I've seen someone someone guesses how long it will take to respond to their survey, it's off by a factor of 2-5. 

Comment by Linda Linsefors on This might be the last AI Safety Camp · 2024-01-28T01:07:38.782Z · LW · GW

This is a one of thing though. We're not likely to continue to pay them, regardless of what they report. 

Comment by Linda Linsefors on Thoughts on AI Safety Camp · 2024-01-26T16:38:43.656Z · LW · GW

I just found this post (yesterday) while searching the EA Forum archives for something else. 

I've been co-organising AISC1 (2018), AISC8 (2023) and AISC9 (2024). This means that I was not involved when this was posted which is why I missed it. 

What you describe fits very well with my own view of AISC, which is reassuring. 

Comment by Linda Linsefors on This might be the last AI Safety Camp · 2024-01-26T16:30:46.581Z · LW · GW

This depends on how much you trust the actors involved.

I know that me and Remmelt asked for an honest evaluation, and did not try to influence the result. But you don't know this.

Me and Remmelt obviously believe in AISC, otherwise we would not keep running these programs. But since AISC has been chronically understaffed (like most non-profit initiatives) we have not had time to do a proper follow-up study. When we asked Arb to do this assessment, it was in large part to test our own believes. So far nothing surprising has came out of the investigation, which is reassuring. But if Arb found something bad, I would not want them to hide it.

Here's some other evaluations of AISC (and other things) that where not commissioned by us. I think for both of them, they did not even talk to someone from AISC before posting, although for the second link, this was only due to miscommunication. 

Comment by Linda Linsefors on This might be the last AI Safety Camp · 2024-01-25T21:06:34.744Z · LW · GW

What exactly is the minimum amount to organise an AISC is a bit complicated. 

We could do a super budget version for under $58k which is even more streamlined. This would cut in to quality however. But the bigger problem is this (just speaking for myself):

  • If AISC pays enough for me to live frugally on this salary for the rest of the year, then I can come back an organise another one. (And as a bonus the world also get what ever else I do during the rest of the year, which will probably also be AI safety related.)
  • If that is not the case, I need to have a different primary income, and then I can't promise I'll be available for AISC.

Exactly what is that threshold? I don't know. It depends on my partners income, which is also a bit uncertain. 

If I'm not available, is it possible to get someone else. Maybe, I'm not sure. My role requires both organising skill and AI safety knowledge.  Most people who are qualified are busy. Also a new person would have to initially put in more hours. Me and Remmelt have a lot of experience doing AISC together, which means we can get it done quicker than someone new. 

We're also fundraising on our website. aisafety.camp
I think that Remmelt chose the $28k threshold, hoping we'll get some money though other channels too. Currently we got ~5.5k donation not through Manifund. 

If we get to the $28k threshold, and nothing more, we'll try to do something approximately like a next AISC, some how. But in this case I'll probably quit after that. 

Comment by Linda Linsefors on This might be the last AI Safety Camp · 2024-01-25T20:25:29.983Z · LW · GW

Thanks Thomas for asking these questions. 

I think some of these are common concerns about AISC, partly because we have not always been very clear in our communication. This was a good opportunity for us to clarify. 

Comment by Linda Linsefors on This might be the last AI Safety Camp · 2024-01-25T20:16:26.614Z · LW · GW

I see your concern. 

Me and Remmelt have different beliefs about AI risk, which is why the last AISC was split into two streams.  Each of us are allowed to independently accept project into our own stream.

Remmelt believes that AGI alignment is impossible, i.e. there is no way to make AGI safe. Exactly why Remmelt believes this is complicated, and something I my self is still trying to understand, however this is actually not very important for AISC. 

The consequence of this for this on AISC is that Remmelt is only interested in project that aims to stop AI progress. 

I still think that alignment is probably technically possible, but I'm not sure. I also believe that even if alignment is possible, we need more time to solve it. Therefore, I see project that aim to stop or slow down AI progress as good, as long as there are not too large adverse side-effect. Therefore, I'm happy to have Remmelt and the projects in his stream as part of AISC. Not to mention that me an Remmelt work really well together, despite or different beliefs.  

If you check our website, you'll also notice that most of the projects are in my stream. I've been accepting any project as long as the there is a reasonable plan,  there is a theory of change under some reasonable and self consistent assumptions, and the downside risk is not too large. 

I've bounced around a lot in AI safety, trying out different ideas, stared more research projects than I finished, which has given me a wide view of different perspectives. I've updated many times in many directions, which have left me with a wide uncertainty as to what perspective is correct. This is reflected in what projects I accept to AISC. I believe in a "lets try everything" approach. 

 

At this point, someone might think: If AISC is not filtering the project more than just "seems worth a try", then how do AISC make sure not to waist participants time on bad projects.

Our participants are adults, and we treat them as such. We do our best to present what AISC is, and what to expect, and then let people decide for themselves if it seems like something worth their time.

We also require research leads to do the same. I.e. the project plan has to provide enough information for potential participants to judge if this is something they want to join. 

I believe there is a significant chance that the solution to alignment is something no-one has though of yet. I also believe that the only way to do intellectual exploration is to let people follow their own ideas, and avoid top down curation. 

The only thing I filter hard for in my stream is that the research lead actually need to have a theory of change. They need to have actually though about AI risk, and why their plan could make a difference. I had this conversation with every research lead in my stream. 

We had one person last AISC who said that they regretted joining AISC, because they could have learned more from spending that time on other things. I take that feedback seriously. But on the other hand, I've regularly meet alumni who tell me how useful AISC was for them, which convinces me AISC is clearly very net positive. 

However, if we where not understaffed (due to being underfunded), we could do more to support the research leads to make better projects.

Comment by Linda Linsefors on This might be the last AI Safety Camp · 2024-01-25T20:14:02.713Z · LW · GW
  • All but 2 of the papers listed on Manifund as coming from AISC projects are from 2021 or earlier. Because I'm interested in the current quality in the presence of competing programs, I looked at the two from 2022 or later: this in a second-tier journal and this in a NeurIPS workshop, with no top conference papers. I count 52 participants in the last AISC so this seems like a pretty poor rate, especially given that 2022 and 2023 cohorts (#7 and #8) could both have published by now.
  • [...] They also use the number of AI alignment researchers created as an important metric. But impact is heavy-tailed, so the better metric is value of total research produced. Because there seems to be little direct research, to estimate the impact we should count the research that AISC alums from the last two years go on to produce. Unfortunately I don't have time to do this.

That list of papers is for direct research output of AISC. Many of our alumni have lots of publications not on that list. 

For example, I looked up Marius Hobbhahn - Google Scholar

Just looking at the direct project outputs is not a good metric for evaluating AISC since most of the value comes from the upskilling. Counting the research that AISC alumns have done since AISC, is not a bad idea, but as you say, a lot more work, I imagine this is partly why Arb chose to do it the way they did. 

I agree that heavy tailed-ness in research output is an important considerations. AISC do have some very successful alumni. If we didn't this would be a major strike against AISC. The thing I'm less certain of is to what extent these people would have succeeded without AISC. This is obviously a difficult thing to evaluate, but still worth trying. 

Mostly we let Arb decide how to best to their evaluation, but I've specifically asked them to interview our most successful alumni to at least get these peoples estimate of the importance of AISC. The result of this will be presented in their second report.

Comment by Linda Linsefors on This might be the last AI Safety Camp · 2024-01-25T15:56:39.117Z · LW · GW
  • MATS has steadily increased in quality over the past two years, and is now more prestigious than AISC. We also have Astra, and people who go directly to residencies at OpenAI, Anthropic, etc. One should expect that AISC doesn't attract the best talent.


There is so much wrong here, I don't even know how to start (i.e. I don't know what the core cruxes are) but I'll give it a try. 

I AISC is not MATS because we're not trying to be MATS. 

MATS is trying to find the best people and have them mentored by the best mentors, in the best environment. This is great! I'd recommend MATS to anyone who can get in. However it's not scalable. After MATS has taken the top talent and mentors, there are still dosens of people who can mentor and would be happy to do so, and hundreds of people who it is worth mentoring.

To believe that MATS style program is the only program worth running, you have to believe that

  1. Only the top talent matter
  2. MATS and similar program has perfect selection, i.e. no-one worth accepting is ever rejected.

I'm not going to argue about 1. I suspect it's wrong, but I'm not very sure.

However, believing in 1 is not enough. You also need 2, and believing in 2 is kind of insane. I don't know how else to put it. Sorry.

You're absolutely correct that AISC have lower average talent. But because we have a lower bar, we get the talent that MATS and other prestigious programs are missing. 

AISC is this way by design. The idea of AISC is to give as many people as we can the chance to join the AI safety effort, to try the waters, or to show the world what they can do, or to get inspiration to do something else. 

And I'm not even addressing the accessibility of a part time online program. There are people who can't join MATS and similar, because they can't take the time to do so, but can join AISC. 

Also, if you believe strongly in MATS ability to select for talent, then consider that some AISC participants go to attend MATS later. I think this fact proves my point, that AISC can support people that MATS selection proses don't yet recognise.

  • If so, AISC might not make efficient use of mentor / PI time, which is a key goal of MATS and one of the reasons it's been successful.

This is again missing the point. The deal AISC offers to our research leads, is that they provide a project and we help them find people to work with them. So far our research leads have been very happy with this arrangement.

MATS is drawing their mentors from a small pool of well known people. This means that they have to make the most out of a very scarce resource. We're not doing that. 

AISC has an open application for people interested in leading a project. This way we get research leads you've never heard of, and who are happy to spend time on AISC in exchange for extra hands on their projects. 

One reason AISC is much more scalable than MATS is that we're drawing from a much larger pool of "mentors".

 

At this point, someone might think: So AISC has inexperienced mentors leading inexperienced participants.  How does this possibly go well?

This is not a trivial question. This is a big part of what the current version of ASIC is focusing on solving. First of all, a research lead is not the same as a mentor. Research leads are welcome to provide mentorship to it's participants, but that's not their main role.  

The research leads role is to suggest a project and formulate a project plan, and then to lead that project. This is actually much easier to do than to provide general mentorship. 

A key part of this are the project plans. As part of the application proses for research leads, we require them to write down a project plan. When necessary, we help them with this. 

Another key part of how AISC is successful with less experienced "mentors", is that we require our research leads to take active part in their projects. This obviously takes up more of their time, but also makes things work better, and to a large extent makes up for the research leads being less experienced than in other programs. And as mentioned, we get lots of project leads who are happy with this arrangement.



What the participants get is learning by doing by being part of a project that at least aims to reduce AI risk.

Some of our participants comes from AI safety Fundamentals and other such courses. Other people are professionals with various skills and talent, but not yet much involvement in AI Safety. We help these people to take the step from AI safety student or AI safety concerned professional, to being someone who actually do something. 

Going from just thinking and learning, to actively engaging, is a very big step, and a lot of people would not have taken that step, or taken it later, if not for AISC.

Comment by Linda Linsefors on The Plan - 2023 Version · 2024-01-08T10:18:02.531Z · LW · GW

MIRI’s impossibility results

Which are these? I'm aware of a lot of MIRI's work, especially pre 2018, but nothing I would label "impossibility results".

Comment by Linda Linsefors on Interpreting the Learning of Deceit · 2023-12-29T20:15:56.143Z · LW · GW

Current Interpretability results suggest that roughly the first half of the layers in an LLM correspond to understanding the context at increasingly abstract levels, and the second half to figuring out what to say and turning that back from abstractions into concrete tokens. It's further been observed that in the second half, figuring out what to say generally seems to occur in stages: first working out the baseline relevant facts, then figuring out how to appropriately slant/color those in the current context, then converting these into the correct language, and last getting the nitty-gritty details of tokenization right.

How do we know this? This claim seems plausible, but also I did not know that mech-interp was advanced enough to verify something like this. Where can I read more?

Comment by Linda Linsefors on New Tool: the Residual Stream Viewer · 2023-10-26T16:26:29.666Z · LW · GW

It looks like this to me:

Where's the colourful text?
Is it broken or am I doing something wrong?

Comment by Linda Linsefors on Projects I would like to see (possibly at AI Safety Camp) · 2023-10-11T17:30:12.726Z · LW · GW

Potentially we might be ok with it if the expected timescale is long enough (or the probability of it happening in a given timescale is low enough).

Agreed. I'd love for someone to investigate the possibility of slowing down substrate-convergence enough to be basically solved.

If that's true then that is a super important finding! And also an important thing to communicate to people! I hear a lot of people who say the opposite and that we need lots of competing AIs.

Hm, to me this conclusion seem fairly obvious. I don't know how to communicate it though, since I don't know what the crux is. I'd be up for participating in a public debate about this, if you can find me an opponent. Although, not until after AISC research lead applications are over, and I got some time to recover. So maybe late November at the earliest. 

Comment by Linda Linsefors on [deleted post] 2023-10-09T23:36:49.197Z

I've made an edit to remove this part.

Comment by Linda Linsefors on [deleted post] 2023-10-09T23:34:11.383Z

Inner alignment asks the question - “Is the model trying to do what humans want it to do?”

This seems inaccurate to me. An AI can be inner aligned and still not aligned if we solve inner aliment but mess up outer alignment. 

This text also shows up in the outer alignment tag: Outer Alignment - LessWrong 

Comment by Linda Linsefors on Projects I would like to see (possibly at AI Safety Camp) · 2023-09-29T10:29:45.491Z · LW · GW
  • An approach could be to say under what conditions natural selection will and will not sneak in. 

Yes!

  • Natural selection requires variation. Information theory tells us that all information is subject to noise and therefore variation across time. However, we can reduce error rates to arbitrarily low probabilities using coding schemes. Essentially this means that it is possible to propagate information across finite timescales with arbitrary precision. If there is no variation then there is no natural selection. 

Yes! The big question to me is if we can reduced error rates enough. And "error rates" here is not just hardware signal error, but also randomness that comes from interacting with the environment.

  • In abstract terms, evolutionary dynamics require either a smooth adaptive landscape such that incremental changes drive organisms towards adaptive peaks and/or unlikely leaps away from local optima into attraction basins of other optima. In principle AI systems could exist that stay in safe local optima and/or have very low probabilities of jumps to unsafe attraction basins. 

It has to be smooth relative to the jumps the jumps that can be achieved what ever is generating the variation. Natural mutation don't typically do large jumps. But if you have a smal change in motivation for an intelligent system, this may cause a large shift in behaviour. 

  • I believe that natural selection requires a population of "agents" competing for resources. If we only had a single AI system then there is no competition and no immediate adaptive pressure.

I though so too to start with. I still don't know what is the right conclusion, but I think that substrate-needs convergence it at least still a risk even with a singleton. Something that is smart enough to be a general intelligence, is probably complex enough to have internal parts that operate semi independently, and therefore these parts can compete with each other. 

I think the singleton scenario is the most interesting, since I think that if we have several competing AI's, then we are just super doomed. 

And by singleton I don't necessarily mean a single entity. It could also be a single alliance. The boundaries between group and individual is might not be as clear with AIs as with humans. 

  • Other dynamics will be at play which may drown out natural selection. There may be dynamics that occur at much faster timescales that this kind of natural selection, such that adaptive pressure towards resource accumulation cannot get a foothold. 

This will probably be correct for a time. But will it be true forever? One of the possible end goals for Alignment research is to build the aligned super intelligence that saves us all. If substrate convergence is true, then this end goal is of the table. Because even if we reach this goal, it will inevitable start to either value drift towards self replication, or get eaten from the inside by parts that has mutated towards self replication (AI cancer), or something like that.

  • Other dynamics may be at play that can act against natural selection. We see existence-proofs of this in immune responses against tumours and cancers. Although these don't work perfectly in the biological world, perhaps an advanced AI could build a type of immune system that effectively prevents individual parts from undergoing runaway self-replication. 

Cancer is an excellent analogy. Humans defeat it in a few ways that works together

  1. We have evolved to have cells that mostly don't defect
  2. We have an evolved immune system that attracts cancer when it does happen
  3. We have developed technology to help us find and fight cancer when it happens
  4. When someone gets cancer anyway and it can't be defeated, only they die, it don't spread to other individuals. 

Point 4 is very important. If there is only one agent, this agent needs perfect cancer fighting ability to avoid being eaten by natural selection. The big question to me is: Is this possible?

If you on the other hand have several agents, they you defiantly don't escape natural selection, because these entities will compete with each other. 
 



 

Comment by Linda Linsefors on Rationality: From AI to Zombies · 2023-09-27T21:18:03.206Z · LW · GW

I got into AI Safety. My interest in AI Safety lured me to a CFAR workshop, since it was a joint event with MIRI. I came for the Agent Foundations research, but the CFAR turned out just as valuable. It helped me start to integrate my intuitions with my reasoning, though IDC and other methods. I'm still in AI Safety, mostly organising, but also doing some thinking, and still learning. 

My resume lists all the major things I've been doing. Not the most interesting format, but I'm probably not going to write anything better anytime soon.
Resume - Linda Linsefors - Google Docs

Comment by Linda Linsefors on Steering GPT-2-XL by adding an activation vector · 2023-09-26T10:41:12.492Z · LW · GW

We don't know why the +2000 vector works but the +100 vector doesn't. 

My guess is it's because in the +100 case the vectors are very similar, causing their difference to be something un-natural.

"I talk about weddings constantly  "  and  "I do not talk about weddings constantly" are technically opposites. But if you imagine someone saying this, you notice that their neural language meaning is almost identical. 

What sort of person says  "I do not talk about weddings constantly"? That sounds to me like someone who talks about weddings almost constantly. Why else would they feel the need to say that?

Comment by Linda Linsefors on Steering GPT-2-XL by adding an activation vector · 2023-09-26T10:16:23.600Z · LW · GW

To steer a forward pass with the "wedding" vector, we start running an ordinary GPT-2-XL forward pass on the prompt "I love dogs" until layer 6. Right before layer 6 begins, we now add in the cached residual stream vectors from before:

I have a question about the image above this text.

Why do you add the embedding from the [<endofotext> -> "The"] stream? This part has no information about wedding.

Comment by Linda Linsefors on AI presidents discuss AI alignment agendas · 2023-09-26T09:45:54.003Z · LW · GW

I had a bit of trouble hearing the difference in voice between Trump and Biden, at the start. I solved this by actually imagining the presidents. Not visually, since I'm not a visual person, just loading up the general gestalt of their voices and typical way of speaking into my working memory. 

Another way to put it: When I asked my self "which if the voices I heard so far is this" I sometimes could not tell. But when I asked my self "who is this among Obama, Trump and Biden" it was always clear.

Comment by Linda Linsefors on Meta Questions about Metaphilosophy · 2023-09-25T19:09:59.186Z · LW · GW

If you think it would be helpful, you are welcome to suggest a meta philpsophy topic for AI Safety Camp.

More info at aisafety.camp. (I'm typing on a phone, I'll add actuall link later if I remember too)

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-13T13:05:43.144Z · LW · GW

This is a good point. I was thinking in terms of legal vs informal, not in terms of written vs verbal. 

I agree that having something written down is basically always better. Both for clarity, as you say, and because peoples memories are not perfect. And it have the added bonus that if there is a conflict, you have something to refer back to.

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-13T12:52:04.439Z · LW · GW

Thanks for adding your perspective. 

If @Rob Bensinger does in fact cross-post Linda's comment, I request he cross-posts this, too.

I agree with this.

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-13T12:48:31.838Z · LW · GW

I'm glad you liked it. You have my permission to cross post.

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-10T14:07:39.816Z · LW · GW

Thanks for writing this post.

I've heard enough bad stuff about Nonlinear from before, that I was seriously concerned about them. But I did not know what to do. Especially since part of their bad reputation is about attacking critics, and I don't feel well positioned to take that fight.

I'm happy some of these accusations are now out in the open. If it's all wrong and Nonlinear is blame free, then this is their chance to clear their reputation. 

I can't say that I will withhold judgment until more evidence comes in, since I already made a preliminary judgment even before this post. But I can promise to be open to changing my mind. 

Comment by Linda Linsefors on Sharing Information About Nonlinear · 2023-09-10T13:48:57.533Z · LW · GW

I have worked without legal contracts for people in EA I trust, and it has worked well.

Even if all the accusation of Nonlinear is true, I still have pretty high trust for people in EA or LW circles, such that I would probably agree to work with no formal contract again.

The reason I trust people in my ingroup is that if either of us screw over the other person, I expect the victim to tell their friends, which would ruin the reputation of the wrongdoer. For this reason both people have strong incentive to act in good faith. On top of that I'm wiling to take some risk to skip the paper work.

When I was a teenager I worked a bit under legally very sketch circumstances. They would send me to work in some warehouse for a few days, and draw up the contract for that work afterwards. Including me falsifying the date for my signature. This is not something I would have agreed to with a stranger, but the owner of my company was a friend of my parents, and I trusted my parents to slander them appropriately if they screwed me over. 

I think my point is that this is not something very uncommon, because doing everything by the book is so much overhead, and sometimes not worth it.

It think being able to leverage reputation based and/or ingroup based trust is immensely powerful, and not something we should give up on.

For this reason, I think the most serious sin committed by Nonlinear, is their alleged attempt of silencing critics. 
Update to clarify: This is based on the fact that people have been scared of criticising Nonlinear. Not based on any specific wording of any specific message.
Update: On reflection, I'm not sure if this is the worst part (if all accusations are true). But it's pretty high on the list.

I don't think making sure that no EA every give paid work to another EA, with out a formal contract, will help much. The most vulnerable people are those new to the movement, which are exactly the people who will not know what the EA norms are anyway. An abusive org can still recruit people with out contracts and just tell them this is normal. 

I think a better defence mechanism is to track who is trust worthy or not, by making sure information like this comes out. And it's not like having a formal contract prevents all kinds of abuse.

Update based on responses to this comment: I do think having a written agreement, even just an informal expression of intentions, is almost always strictly superior to not having anything written down. When writing this I comment I was thinking in terms of formal contract vs informal agreement, which is not the same as verbal vs written. 

Comment by Linda Linsefors on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T12:01:55.754Z · LW · GW

But I think orgs are more likely to be well-known to grant-makers on average given that they tend to have a higher research output,


I think your getting the causality backwards. You need money first, before there is an org. Unless you count informal multi people collaborations as orgs. 

I think people how are more well-known to grant-makers are more likely to start orgs. Where as people who are less known are more likely to get funding at all, if they aim for a smaller garant, i.e. as an independent researcher. 

Comment by Linda Linsefors on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T11:57:02.075Z · LW · GW

Counter point. After the FTX collapse, OpenPhil said publicly (some EA Forum post)  that they where raising their bar for funding. I.e. there are things that would have been funded before that would now not be funded. The stated reason for this is that there are generally less money around, in total. To me this sounds like the thing you would do if money is the limitation. 

I don't know why OpenPhil don't spend more. Maybe they have long timelines and also don't expect any more big donors any time soon? And this is why they want to spend carefully?

Comment by Linda Linsefors on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T11:51:56.460Z · LW · GW

From what I can tell, the field have been funding constrained since the FTX collapse.

What I think happened: 
FTX had lots of money and a low bar for funding, which meant they spread a lot of money around. This meant that more project got started, and probably even more people got generally encouraged to join. Probably some project got funded that should not have been, but probably also some really good projects got started that did not get money before because not clearing the bar before due to not having the right connections, or just bad att writing grant proposals. In short FTX money and the promise of FTX money made the field grow quickly. Also there where where also some normal field growth. AIS has been growing steadily for a while. 

Then FTX imploded. There where lots of chaos. Grants where promised but never paid out. Some orgs don't what to spend the money they did get from FTX because of risk of clawback risks. Other grant makers cover some of this but not all of this. It's still unclear what the new funding situation is.

Some months later, SFF, FTX and Nonlinear Network have their various grant rounds. Each of them get overwhelmed with applications. I think this is mainly from the FTX induced growth spurt, but also partly orgs still trying to recover from loss of FTX money, and just regular growth. Either way, the outcome of these grant rounds make it clear that the funding situation has changed. The bar for getting funding is higher than before. 

Comment by Linda Linsefors on Demystifying Born's rule · 2023-06-16T13:57:52.323Z · LW · GW

Not with out spending more time than I want on this. Sorry.

Comment by Linda Linsefors on Demystifying Born's rule · 2023-06-16T13:02:32.461Z · LW · GW

I admit that I did not word that very well. Honestly don't know how to concisely express how much Copenhagen interpretation makes no sese at all, not even as an interpretation.