Posts

Kajus's Shortform 2024-02-19T22:26:04.537Z

Comments

Comment by Kajus on MATS Alumni Impact Analysis · 2024-11-21T09:20:11.562Z · LW · GW

It could be really interesting how the employemnt looks before and after the camp. 

Comment by Kajus on Complex Systems for AI Safety [Pragmatic AI Safety #3] · 2024-11-15T08:17:17.364Z · LW · GW

Great post! 

In the past, broad interventions would clearly have been more effective: for instance, there would have been little use in studying empirical alignment prior to deep learning. Even more recently than the advent of deep learning, many approaches to empirical alignment were highly deemphasized when large, pretrained language models arrived on the scene (refer to our discussion of creative destruction in the last post).

 

As discussed in the last post, a leading motivation for researchers is the interestingness or “coolness” of a problem. Getting more people to research relevant problems is highly dependent on finding interesting and well-defined subproblems for them to work on. This relies on concretizing problems and providing funding for solving them.

This seems be a conflicting advice to me. If you try to follow both you might end up having hard time finding direction for research. 

Comment by Kajus on Winning isn't enough · 2024-11-05T16:09:04.662Z · LW · GW

I don't fully understand the post. Without a clear definition of "winning," the points you're trying to make — as well as the distinction between pragmatic and non-pragmatic principles (which also aligns with strategies and knowledge formation) — aren't totally clear. For instance, "winning," in some vague sense, probably also includes things like "fitting with evidence," taking advice from others, and so on. You don't necessarily need to turn to non-pragmatic principles or those that don’t derive from the principle of winning. "Winning" is a pretty loose term.

Comment by Kajus on Kajus's Shortform · 2024-04-05T17:40:37.992Z · LW · GW

I've just read "Against the singularity hypothesis" by David Thorstad and there are some things there that seems obviously wrong to me - but I'm not totally sure about it and I want to share it here, hoping that somebody else read it as well. In the paper, Thorstad tries to refute the singularity hypothesis. In the last few chapters, Thorstad discuses the argument for x-risks from AI that's based on three premises: singularity hypothesis, Orthogonality Thesis and Instrumental Convergence and says that since singularity hypothesis is false (or lacks proper evidence) we shouldn't worry that much about this specific scenario. Well, it seems to me like we should still worry and we don't need to have recursively self-improving agents to have agents smart enough so that instrumental convergence and orthogonality hypothesis applies to them. 

Comment by Kajus on AI things that are perhaps as important as human-controlled AI · 2024-03-04T12:10:43.718Z · LW · GW

Interesting! Reading this makes me think that there is some kind of tension between “paperclip maximizer” view on AI. Some interventions or risks you mentioned assume that AI will get its attitude from the training data, while the “paperclip maximizer” is an AI with just a goal and with whatever beliefs it will help it to achieve it. I guess the assumptions is that the AI will be much more human in some way. 

Comment by Kajus on Kajus's Shortform · 2024-03-02T14:26:53.479Z · LW · GW

The power-seeking, agentic, deceptive AI is only possible if there is a smooth transition from non-agentic AI (what we have right now) to agentic AI. Otherwise, there will be a sign that AI is agentic, and it will be observed for those capabilities. If an AI is mimicking human thinking process, which it might initially do, it will also mimic our biases and things like having pent-up feelings, which might cause it to slip and loose its temper. Therefore, it's not likely that power-seeking agentic AI is a real threat (initially).

Comment by Kajus on Kajus's Shortform · 2024-03-02T13:37:09.051Z · LW · GW

I started to think through the theories of change recently (to figure out a better career plan) and I have some questions. I hope somebody can direct me to relevant posts or discuss this with me.

The scenario I have in mind is: AI alignment is figured out. We can create an AI that will pursue the goals we give it and can still leave humanity in control. This is all optional, of course: you can still create an unaligned, evil AI. What's stopping anybody from creating AI that will try to, for instance, fight wars? I mean that even if we have the technology to align AI, we are still not out of the forest. 

What would solve the problem here would be to create a benevolent, omnipresent AGI, that will prevent things like this. 

Comment by Kajus on Survey for alignment researchers! · 2024-02-23T09:04:29.738Z · LW · GW

What do you mean by an alignment researcher? Is somebody who did AI Safety Fundamentals an alignment researcher? Is somebody participating in MATS, AISC or SPAR an alignment researcher? Or somebody who has never posted anything on LW? 

Comment by Kajus on CFAR Takeaways: Andrew Critch · 2024-02-20T11:28:34.551Z · LW · GW

That makes sense, but it should've been at least mentioned somewhere that they think they aren't teaching the most important skills, and they think that numeracy is more important. The views expressed in the post might not be views of the whole CFAR staff.  

Comment by Kajus on CFAR Takeaways: Andrew Critch · 2024-02-19T12:54:27.904Z · LW · GW

You once told me that there were ~20 things a person needed to be generally competent. What were they?

I'm not sure I had an exact list, but I'll try to generate one now and see how many things are on it:

  1. Numeracy

 

"What surprised you most during your time at CFAR?

Surprise 1: People are profoundly non-numerate. 

And, people who are not profoundly non-numerate still fail to connect numbers to life. 

This surprised me a lot, because I was told (by somebody who read CFAR handbook) that CFAR isn't mostly about numeracy, and I've never heard about skills involving number crunching from people who went to CFAR workshops. Didn't they just fail to update on that? If that's the most important skill, shouldn't we have a new CFAR who teaches numeracy, reading and writing? 

Comment by Kajus on Kajus's Shortform · 2024-02-19T12:38:50.405Z · LW · GW

Did EA scale too quickly?  
 
A friend recommended me to read a note from Andy's working notes, which argues that scaling systems too quickly led to rigid systems. Reading this note vaguely reminded me of EA.

Once you have lots of users with lots of use cases, it’s more difficult to change anything or to pursue radical experiments. You’ve got to make sure you don’t break things for people or else carefully communicate and manage change.

Those same varied users simply consume a great deal of time day-to-day: a fault which occurs for 1% of people will present no real problem in a small prototype, but it’ll be high-priority when you have 100k users.

First, it is debatable if EA experienced quick scale up in the last few years. In some ways, it feels to me like it did, and EA founds had a spike of founding in 2022.

But it feels to me like EA community didn't have things figured out properly. Like SBF crisis could be averted easily by following common business practices or the latest drama with nonlinear. The community norms were off and were hard to change?