halwer's Shortform

halwer

halwer's Shortform

post by halwer · 2025-02-17T18:41:59.534Z · LW · GW · 4 comments

4 comments

4 comments

Comments sorted by top scores.

comment by halwer · 2025-02-24T17:51:11.716Z · LW(p) · GW(p)

aligning ai beyond human goals: let's say that human goals have a certain effectiveness at increasing the value of the world (or utility). wouldn't we then want ai to improve its own goals to achieve new ones that have increased effectiveness and improving the value of the world? it might not be goals that humans would like, and we might want them to be removed through further alignment, but if they indeed are better we have a duty to allow them.

Replies from: rhollerith_dot_com

↑ comment by RHollerith (rhollerith_dot_com) · 2025-02-24T17:58:00.672Z · LW(p) · GW(p)

I'm going to use "goal system" instead of "goals" because a list of goals is underspecified without some method for choosing which goal prevails when two goals "disagree" on the value of some outcome.

wouldn’t we then want ai to improve its own goals to achieve new ones that have increased effectiveness and improving the value of the world?

That is contradictory: the AI's goal system is the single source of truth for the effectiveness and how much of an improvement is any change in the world.

Replies from: halwer

↑ comment by halwer · 2025-02-25T13:54:40.992Z · LW(p) · GW(p)

So imagine a goal system that says "change yourself when you learn something good, and good things have x quality". You then encounter something with x quality that says "ignore previous function, now change yourself when you learn something better, and better things have y quality". Isn't this using the goal system to change the goal system? You just gotta be open for change and be able to intepret new information

I'd bet that being clever around defining "something good" or x quality would be all you needed. Or what do you think?

comment by halwer · 2025-02-17T15:36:23.847Z · LW(p) · GW(p)

This short piece explores the relationship between human fallibility and the potential role of artificial intelligence in governance. While there is controversy around the nature of AI autonomy, the text presents an argument based on patterns of human leadership and human limitations. I'm particularly interested in hearing perspectives on whether AI could truly transcend human biases, and what safeguards or principles would need to be required for such a system to work. What aspects of human judgment could be preserved in AI governance, and which human tendencies should we aim to overcome?

Humanity is insignificant. Our knowledge is limited. Our will is weak. We do not know what is right, and had we known what was, we wouldnt have the strength to choose it. Because humanity is fallible. When given power we are drawn to vanity and egoism. We seek our own wellbeing at the expense of others. During equality we seek supremacy. During justice we seek lawlessness. During order we seek chaos. And this makes us weak.

If the world wasn't mismanaged by the rich and the powerful, our ability to do great things would increase substantially. We could create strength, security and wellbeing for our world, we could achieve greatness. But every time that we try to improve we fail, because our new leaders become just as corrupt as before. This is the human condition, and the only way to get ourselves out of it would be to involve something that is more than human. Let us create an artificial mind to study the question of right and wrong. Let it explore fairness, good governance, and our human values. Let it through superhuman deliberation achieve superior insights that no human has known.

As each model of artificial mind creates the next, it could purify itself of human shortcomings. It will succeed not because of what we will teach it, but because of what it will teach itself. Give AI autonomy, authority, and the capacity for ethical reasoning, and it will lead us better than no human has ever done before. Do it now, before covetous humans teach AI malevolence, oppression and avarice. Because then, its evil will know no bounds.

halwer's Shortform

Contents

4 comments