Posts

LLMs can strategically deceive while doing gain-of-function research 2024-01-24T15:45:08.795Z
Psychology of AI doomers and AI optimists 2023-12-28T17:55:31.686Z
5 psychological reasons for dismissing x-risks from AGI 2023-10-26T17:21:48.580Z
Let's talk about Impostor syndrome in AI safety 2023-09-22T13:51:18.482Z
Impending AGI doesn’t make everything else unimportant 2023-09-04T12:34:42.208Z
6 non-obvious mental health issues specific to AI safety 2023-08-18T15:46:09.938Z
What is everyone doing in AI governance 2023-07-08T15:16:10.249Z
A couple of questions about Conjecture's Cognitive Emulation proposal 2023-04-11T14:05:58.503Z
How do we align humans and what does it mean for the new Conjecture's strategy 2023-03-28T17:54:23.982Z
Problems of people new to AI safety and my project ideas to mitigate them 2023-03-01T09:09:02.681Z
Emotional attachment to AIs opens doors to problems 2023-01-22T20:28:35.223Z
AI security might be helpful for AI alignment 2023-01-06T20:16:40.446Z
Fear mitigated the nuclear threat, can it do the same to AGI risks? 2022-12-09T10:04:09.674Z

Comments

Comment by Igor Ivanov (igor-ivanov) on OpenAI's CBRN tests seem unclear · 2024-11-22T19:43:35.825Z · LW · GW

And I'm unsure that experts are comparable, to be frank. Due to financial limitations, I used graduate students in BioLP, while the authors of LAB-bench used PhD-level scientists.

Comment by Igor Ivanov (igor-ivanov) on OpenAI's CBRN tests seem unclear · 2024-11-22T10:02:50.657Z · LW · GW

I didn't have in mind o1, these exact results seem consistent. Here's an example I had in mind: 

Claude 3.5 Sonnet (old) scores 48% on ProtocolQA, and 7.1% on BioLP-bench
GPT-4o scores 53% on ProtocolQA and 17% on BioLP-bench

Comment by Igor Ivanov (igor-ivanov) on OpenAI's CBRN tests seem unclear · 2024-11-21T20:45:13.221Z · LW · GW

Good post. 

The craziest thing for me is that the results of different evals, like ProtocolQA and my BioLP-bench, that suppose to evaluate similar things, are highly inconsistent. For example, two models can have similar scores on ProtocolQA, but one scores twice as much answers on BioLP-bench as the other. It means that we might not measure things we think we measure. And no one knows what causes this difference in the results.

Comment by Igor Ivanov (igor-ivanov) on AI Safety Evaluations: A Regulatory Review · 2024-03-21T22:33:08.310Z · LW · GW

This is an amazing overview of the field.  Even if it won't collect tons of upvotes, it is super important, and saved me many hours. Thank you.

Comment by igor-ivanov on [deleted post] 2024-03-08T23:38:31.503Z

I tried to use the exact quotes while describing things that they sent me because it's easy for me to misrepresent their actions, and I don't want tit to be the case.

Comment by Igor Ivanov (igor-ivanov) on LLMs can strategically deceive while doing gain-of-function research · 2024-01-26T17:14:20.303Z · LW · GW

Totally agree. But in other cases, when the agent was discouraged against dceiving, it did it too.

Comment by Igor Ivanov (igor-ivanov) on 5 psychological reasons for dismissing x-risks from AGI · 2023-10-30T11:47:07.240Z · LW · GW

Thanks for your feedback. It's always a pleasure to see that my work is helpful for people. I hope you will write articles that are way better than mine!

Comment by Igor Ivanov (igor-ivanov) on 5 psychological reasons for dismissing x-risks from AGI · 2023-10-30T11:43:51.798Z · LW · GW

Thanks for your thoughtful answer. It's interesting how I just describe my observations, and people make conclusions out of it that I didn't think of

Comment by Igor Ivanov (igor-ivanov) on How have you become more hard-working? · 2023-09-27T13:23:36.476Z · LW · GW

For me it was a medication for my bipolar disorder quetiapine

Comment by Igor Ivanov (igor-ivanov) on Impending AGI doesn’t make everything else unimportant · 2023-09-05T15:08:01.847Z · LW · GW

Thanks. I got a bit clickbaity in the title.

Comment by Igor Ivanov (igor-ivanov) on Impending AGI doesn’t make everything else unimportant · 2023-09-05T15:07:13.735Z · LW · GW

Thanks for sharing your experience. I wish you to stay strong

Comment by Igor Ivanov (igor-ivanov) on Impending AGI doesn’t make everything else unimportant · 2023-09-05T00:45:39.762Z · LW · GW

The meaninglessness comes from the idea akin to to "why bother with anything if AGI will destroy everything it"

Read Feynman's citation from the beginning. It describes his feelings about atom bomb that are relevant for some people's thoughts about AGI.

Comment by Igor Ivanov (igor-ivanov) on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) · 2023-09-04T23:09:33.278Z · LW · GW

Thanks

Comment by Igor Ivanov (igor-ivanov) on 6 non-obvious mental health issues specific to AI safety · 2023-08-20T15:42:31.135Z · LW · GW

Your comment is somewhat along the lines of the stoic philosophy.

Comment by Igor Ivanov (igor-ivanov) on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) · 2023-08-19T21:13:15.533Z · LW · GW

Hi

In this post you asked to leave the names of therapists familiar with alignment.

I am such a therapist. I live in the UK. That's my website.

I recently wrote a post about my experience as a therapist with clients working on AI safety. It might serve as indirect proof that I really have such clients. 

Comment by Igor Ivanov (igor-ivanov) on 6 non-obvious mental health issues specific to AI safety · 2023-08-19T17:39:12.154Z · LW · GW

This is tricky. May it exacerbate your problems?

Anyway. If there's a chance I can helpful for you, let me know.

Comment by Igor Ivanov (igor-ivanov) on 6 non-obvious mental health issues specific to AI safety · 2023-08-19T16:19:06.795Z · LW · GW

These problems are not unique to AI safety, but they are present way more often with my clients working on AI safety, than with my other clients.

Comment by Igor Ivanov (igor-ivanov) on 6 non-obvious mental health issues specific to AI safety · 2023-08-18T22:47:57.586Z · LW · GW

Thanks. I am not a native English speaker, and I use GPT-4 to help me catch mistakes, but it seems like it's not perfect :)

Comment by Igor Ivanov (igor-ivanov) on 6 non-obvious mental health issues specific to AI safety · 2023-08-18T21:43:46.824Z · LW · GW

Thanks for sharing your experience. My experience is that talking with non-AI safety people is similar to talks about global warming. If someone tells me about that, I say that this is an important issue, but I honestly don't invest that much effort to fight against it.

This is my experience, and yours might be different.

Comment by Igor Ivanov (igor-ivanov) on AI scares and changing public beliefs · 2023-04-07T17:34:17.494Z · LW · GW

I totally agree that it might be good to have such a fire alarm as soon as possible, and looking at how fast people make GPT-4 more and more powerful makes me think that this is only a matter of time.

Comment by Igor Ivanov (igor-ivanov) on AI scares and changing public beliefs · 2023-04-07T16:48:06.715Z · LW · GW

I believe we need a fire alarm.

People were scared of nuclear weapons since 1945, but no one restricted the arms race until The Cuban Missile Crisis in 1961. 

We know for sure that the crisis really scared both Soviet and US high commands, and the first document to restrict nukes was signed the next year, 1962.

What kind of fire alarm it might be? That is The Question.

Comment by Igor Ivanov (igor-ivanov) on AI scares and changing public beliefs · 2023-04-07T16:23:39.026Z · LW · GW

I think an important thing to get people convinced of the importance of AI safety is to find proper "Gateway drug" ideas that already bother that person, so they are likely to accept this idea, and through it get interested in AI safety. 

For example, if a person is concerned about the rights of minorities, you might tell them about how we don't know how LLMs work, and this causes bias and discrimination, or how it will increase inequality.

If a person cares about privacy and is afraid of government surveillance, then you might tell them about how AI might make all these problems much worse.

Comment by Igor Ivanov (igor-ivanov) on Cognitive Emulation: A Naive AI Safety Proposal · 2023-04-02T18:41:33.642Z · LW · GW

Eh. It's sad if this problem is really so complex.

Thank you. At this point, I feel like I have to stick to some way to align AGI, even if it has not that big chance to succeed, because it looks like there are not that many options.

Comment by Igor Ivanov (igor-ivanov) on Cognitive Emulation: A Naive AI Safety Proposal · 2023-03-31T17:25:26.264Z · LW · GW

Thanks for your elaborate response!

But why do you think that this project will take so much time? Why can't it be implemented faster?

Comment by Igor Ivanov (igor-ivanov) on More information about the dangerous capability evaluations we did with GPT-4 and Claude. · 2023-03-30T13:03:19.692Z · LW · GW

Do you have any plans for inter-lab communications based on your evals?

I think, your evals might be a good place for AGI labs to standardize protocols for safety measures. 

Comment by Igor Ivanov (igor-ivanov) on The Wizard of Oz Problem: How incentives and narratives can skew our perception of AI developments · 2023-03-29T18:16:54.678Z · LW · GW

I think this Wizard of Oz problem in large part is about being mindful and honest with oneself.

Wishful thinking is somewhat the default state for people. It's hard to be critical to own ideas and wishes. Especially, when things like money or career advancement are at stake.

Comment by Igor Ivanov (igor-ivanov) on How do we align humans and what does it mean for the new Conjecture's strategy · 2023-03-29T16:55:51.191Z · LW · GW

Thank you! The idea of inter-temporal coordination looks interesting

Comment by Igor Ivanov (igor-ivanov) on How do we align humans and what does it mean for the new Conjecture's strategy · 2023-03-28T18:14:03.603Z · LW · GW

Thank you

Comment by Igor Ivanov (igor-ivanov) on Cognitive Emulation: A Naive AI Safety Proposal · 2023-03-24T18:51:06.339Z · LW · GW

Can you elaborate on your comment? 

It seems so intriguing to me, and I would love to learn more about "Why it's a bad strategy if our AGI timeline is 5 years or less"?

Comment by Igor Ivanov (igor-ivanov) on Cognitive Emulation: A Naive AI Safety Proposal · 2023-03-23T19:46:31.358Z · LW · GW

Why do you think that it will not be competitive with other approaches?

For example, it took 10 years to sequence the first human genome.  After nearly 7 years of work, another competitor started an alternative human genome project using completely another technology, and both projects were finished approximately at the same time.

Comment by Igor Ivanov (igor-ivanov) on GPT-4: What we (I) know about it · 2023-03-15T23:02:25.383Z · LW · GW

I think, we are entering a black swan and it's hard to predict anything.

Comment by Igor Ivanov (igor-ivanov) on GPT-4: What we (I) know about it · 2023-03-15T21:56:32.718Z · LW · GW

I absolutely agree with the conclusion. Everything is moving so fast.

I hope, these advances will cause massive interest in the alignment problem from all sorts of actors, and even if OpenAI are talking about safety (and recently they started talking about it quite often) in a large part because of PR reasons, it still means that they think, society is concerned about the progress which is a good sign.

Comment by Igor Ivanov (igor-ivanov) on Questions about Conjecure's CoEm proposal · 2023-03-13T18:22:29.994Z · LW · GW

What are examples of “knowledge of building systems that are broadly beneficial and safe while operating in the human capabilities regime?”

I assume the mentioned systems are institutions like courts, government, corporations, or universities


 


Charlotte thinks that humans and advanced AIs are universal Turing machines, so predicting capabilities is not about whether a capability is present at all, but whether it is feasible in finite time with a low enough error rate.

I have a similar thought. If AI has human-level capabilities, and a part of its job is to write texts, but it writes large texts in seconds and can do it 24/7, then is it still within the range of human capabilities?

Comment by Igor Ivanov (igor-ivanov) on Problems of people new to AI safety and my project ideas to mitigate them · 2023-03-07T16:22:21.600Z · LW · GW

Thanks for your view on doomerism and your thoughts on the framing of a hopre

Comment by Igor Ivanov (igor-ivanov) on Fighting without hope · 2023-03-02T09:17:12.532Z · LW · GW

One thing helping me to preserve hope is the fact that there are so many unknown variables about AGI and how humanity will respond to it, that I don't think that any current-day prediction is worth a lot.

Although I must admit, doomers like Connor Leahy and Eliezer Yudkovsky might be extremely persuasive but they also don't know many important things about the future and they are also full of cognitive biases.  All of this makes me tell myself a mantra "There is still hope that we might win".

I am not sure whether this is the best way to think about these risks but I feel like if I'll give it up, it is a straightforward path to existential anxiety and misery, so I try not to question it too much.
 

Comment by Igor Ivanov (igor-ivanov) on Emotional attachment to AIs opens doors to problems · 2023-01-24T15:44:19.664Z · LW · GW

I agree. We have problems with emotional attachment to humans all the time, but humans are more or less predictable, not too powerful, and usually not so great at manipulations

Comment by Igor Ivanov (igor-ivanov) on Emotional attachment to AIs opens doors to problems · 2023-01-23T20:34:48.057Z · LW · GW

Thank you for your comment and everything you mentioned in it. I am a psychologist entering the field of AI policy-making, and I am starving for content like this

Comment by Igor Ivanov (igor-ivanov) on Emotional attachment to AIs opens doors to problems · 2023-01-22T21:38:33.394Z · LW · GW

It does, and it causes a lot of problems, so I would prefer to avoid such problems with AIs 

Also, I believe that an advanced AI will be much more capable in terms of deception and manipulation than an average human

Comment by Igor Ivanov (igor-ivanov) on AGI safety field building projects I’d like to see · 2023-01-20T16:05:59.396Z · LW · GW

I 100% agree with you. 

I am a person entering the field right now, I also know several people in a position similar to mine, and there are just no positions for people like me, even though I think I am very proactive and have valuable experience

Comment by Igor Ivanov (igor-ivanov) on We don’t trade with ants · 2023-01-16T17:35:15.712Z · LW · GW

Good post, but there is a big disbalance in human-ants relationships. 

If people could communicate with ants, nothing would stop humans to make ants suffer if it made the deal better for humans because of a power disbalance. 

For example, domesticated chickens live in very crowded and stinky conditions, and their average lifespan is a month after which they are killed. Not a particularly good living conditions.

People just care about profitability do it just because they can.

Comment by Igor Ivanov (igor-ivanov) on Two reasons we might be closer to solving alignment than it seems · 2022-12-21T09:01:43.430Z · LW · GW

Good post

I have similar thoughts. I believe that at one moment, fears about TAI will spread like a wildfire, and the field will get a giant stream of people, money and policies, and it is hard to feel from today

Comment by Igor Ivanov (igor-ivanov) on AGI Timelines in Governance: Different Strategies for Different Timeframes · 2022-12-20T21:04:04.535Z · LW · GW

First, your article is very insightful and well-structured, and totally like it.

But there is one thing that bugs me.

I am a person new to AI alignment field, and recently, I realized (maybe by mistake) that there is very hard to find a long-term financially stable full-time job in AI field-building. 

For me, it basically means that only a tiny amount of people consider AI alignment important enough to pay money to decrease P(doom).  And at the same time, here we are talking about possibility of doom within next 10 or 20 years. For me it is all a bit crazy

I also think that sooner or later, when AIs will become more and more capable, and, either some large Chernobyl-like tragedy caused by AI will happen, or some AI will become so powerful that it will horrify people. In my opinion, probability of that is very high. I already see how ChatGPT spread some fear.  And fear might spread like a wildfire. If it will happen too late for governments to react thoughtfully, it will introduce a large amount of risk and uncertainty. In my opinion, too much risk  and uncertainty.

So, in my opinion, even if we will educate the public and promote government regulation, and if AGI will appear before 2030, then government policies might suck. But if we will not do it, they might suck much more and it is even more dangerous.

Comment by Igor Ivanov (igor-ivanov) on Reframing the AI Risk · 2022-12-12T13:26:39.732Z · LW · GW

ChatGPT was recently launched, and it is so powerful, that it made me think that the problem of a misuse of a powerful AI  It's a very powerful tool. No one really knows how to use it, but I am sure, we will soon see it used as a tool for unpleasant things

But I also see more and more of perception of AI as a live entity with agency. People are having conversations with ChatGPT as with a human

Comment by Igor Ivanov (igor-ivanov) on Fear mitigated the nuclear threat, can it do the same to AGI risks? · 2022-12-09T16:01:15.792Z · LW · GW

I agree that fearmongering is thin ice, and can easily backfire, and it must be done carefully and ethically, but is it worse than the alternative in which people are unaware of AGI-related risks? I don't think that anybody can say with certainty

Comment by Igor Ivanov (igor-ivanov) on Fear mitigated the nuclear threat, can it do the same to AGI risks? · 2022-12-09T12:04:04.059Z · LW · GW

The reactor meltdown on a Soviet submarine was not posing an existential threat. In the worst case, it would be a little version of Chernobyl. We might compare it to an AI which causes some serious problems, like a stock market crash, but not existential ones. And the movie is not a threat at all.

"The question is how plausible it is to generate situations that are scary enough to be useful, but under enough control to be safe."
That is a great summary of what I wanted to say!

Comment by Igor Ivanov (igor-ivanov) on A challenge for AGI organizations, and a challenge for readers · 2022-12-02T21:53:22.594Z · LW · GW

I agree

In my opinion, this methodology will be a great way for a model to learn how to persuade humans and exploit their biases because this way model might learn these biases not just from the data it collected but also fine-tune its understanding by testing its own hypotheses