gabe-3

Posts
Comments

Posts

Most capable publicly available agents? 2024-09-30T00:04:24.480Z

A Psychoanalytic Explanation of Sam Altman's Irrational Actions 2024-09-29T18:58:13.511Z

Comments

Comment by Gabe on A Psychoanalytic Explanation of Sam Altman's Irrational Actions · 2024-09-30T00:10:32.361Z · LW · GW

No, definitely not, I didn't mean to give that impression. I think on a deeper level, when you consider why anyone does anything though, it does come down to basic instinctual desires such as the need to feel loved or the need to feel powerful. In the absence of a rational motivator, it is likely that whatever Sam Altman's primary instinct is will take over, while the ego rationalizes. So, money is maybe the result, but the real driver is likely a deep seated want of power or status.

Comment by Gabe on Any real toeholds for making practical decisions regarding AI safety? · 2024-09-29T21:19:26.639Z · LW · GW

I have had this same question for a while, and this is the general conclusion I've come to:

Identify the safety issues today, solve them, and then assume the safety issues scale as the technology scales, and either amp up the original solution, or develop new tactics to solve these extrapolated flaws.

This sounds a little vague, so here is an example: We see one of the big models misrepresent history in an attempt to be woke, and maybe it gives a teenager a misconception of history. So, the best thing we can do from a safety perspective is figure out how to train models to absolutely represent facts. After this is done, we can extrapolate the flaw up to a model deliberately feeding misinformation to achieve a certain goal, and we can try to use the same solution we used for the smaller problem for the bigger problem, or if we see it won't work, develop a new solution.

The biggest problem with this, is it is reactionary, and if you only use this method, a danger may present itself for the first time, and already cause major harm.

I know this approach isn't as effective for xrisk, but still, it's something I like to use. Easy to say though, coming from someone who doesn't actually work in AI safety.

User info

Posts

Comments