Posts

What should we tell an AI if it asks why it was created? 2024-04-10T20:37:32.330Z

Comments

Comment by cSkeleton on Are the LLM "intelligence" tests publicly available for humans to take? · 2024-04-24T22:11:39.336Z · LW · GW

Is there any information on how long the LLM spent on taking the tests? Any idea? I'd like to know the comparison with human times. (I realize it can depend on hardware, etc but would just like some general idea.)

Comment by cSkeleton on Changes in College Admissions · 2024-04-24T22:00:05.736Z · LW · GW

Someone like Paul Graham or Tyler Cowen is noticing more smarter kids, because we now have much better systems for putting the smarter kids into contact with people like Paul Graham and Tyler Cowen.

I'd guess very smart kids are getting more numerous and smarter at the elite level since I'd guess just about everything is improving at the most competitive level. Unfortunately it doesn't seem like there's much interest in measuring this, e.g. hundreds of kids tie for the maximum score possible on SATs (1600) instead of designing a test that won't max out. 

(Btw, one cool thing I learned about recently is that some tests use dynamic scoring where if you get questions correct the system asks you harder questions.)

Comment by cSkeleton on AI Regulation is Unsafe · 2024-04-23T18:57:22.274Z · LW · GW

Governments are not social welfare maximizers

 

Most people making up governments, and society in general, care at least somewhat about social welfare.  This is why we get to have nice things and not descend into chaos.

Elected governments have the most moral authority to take actions that effect everyone, ideally a diverse group of nations as mentioned in Daniel Kokotajlo's maximal proposal comment.

Comment by cSkeleton on [deleted post] 2024-04-10T20:26:00.729Z

Thanks for your replies! I didn't realize the question was unclear. I was looking for an answer TO provide the AI, not an answer FROM the AI. I'll work on the title/message and try again.

Edit: New post at https://www.lesswrong.com/posts/FJaFMdPREcxaLoDqY/what-should-we-tell-an-ai-if-it-asks-why-it-was-created

Comment by cSkeleton on Towards a New Decision Theory · 2024-03-27T20:04:16.884Z · LW · GW

I'm having difficulty following the code for the urn scenario. Can it be something like?

def P():
    # Initialize the world with random balls (or whatever)
    num_balls = 1000
    urn = [random.choice(["red", "white"]) for i in range(num_balls)]

    # Run the world
    history = []
    total_loss = 0
    for i in range(len(urn)):
        ball = urn[i]
        probability_of_red = S(history)
        if probability_of_red == 1 and ball != 'red' or probability_of_red == 0 and ball == 'red':
            print("You were 100% sure of a wrong prediction. You lose for all eternity.")
            return  # avoid crashing in math.log()
        if ball == 'red':
            loss = math.log(probability_of_red)
        else:
            loss = math.log(1 - probability_of_red)
        total_loss += loss
        history.append(ball)
        print(f"{ball:6}\tPrediction={probability_of_red:0.3f}\tAverage log loss={total_loss / (i + 1):0.3f}")
 

If we define S() as:

def S(history):
    if not history:
        return 0.5
    reds = history.count('red')
    prediction = reds / float(len(history))

    # Should never be 100% confident
    if prediction == 1:
        prediction = 0.999
    if prediction == 0:
        prediction = 0.001

    return prediction

The output will converge on Prediction = 0.5 and Average log loss as log(0.5). Is that right?

Comment by cSkeleton on [deleted post] 2023-11-16T17:48:20.945Z

I find this confusing. My actual strength of belief now that I can tip an outcome that affects at least 3^^^3 other people is a lot closer to 1/(1000000) than 1/(3^^7625597484987). My justification is that while 3^^^3 isn't a number that fits into any finite multiverse, the universe going on for infinitely long seems kinda possible and anthropic reasoning may not be valid here (I added 10x in case it is) and I have various ideas. The difference in those two probabilities is large (to put it mildly), and significant (one is worth thinking about and the other isn't). How to resolve this? 

Comment by cSkeleton on [deleted post] 2023-08-27T23:48:18.836Z

Thanks @RolfAndreassen.  I'm reconsidering and will post a different version if I get there.  I've marked this one as [retracted].

Comment by cSkeleton on [deleted post] 2023-08-13T16:23:17.269Z

Thanks for the response! I really appreciate it.

a) Yes, I meant "the probability of"

b) Thinking about how to plot this on graphs is helping me to clarify thinking and I think adding these may help to reduce inferential distance. (The X axis is probability.  For the case where we consider infinite utilities as opposed to the human case, the graph would need to be split into 2 graphs. The one on left is just an infinity horizontal line but there is still a probability range.  The one on the right has an actual curve and covers the rest of the probability range but doesn't matter since its utility values are finite. Considering only the infinite utilities is a fanatical decision procedure but doesn't generally lead to weird decisions. Does that make sense?) 

Comment by cSkeleton on [deleted post] 2023-06-28T17:28:14.909Z

Repeating the same thing over and over again might be okay but doesn't sound great.

Comment by cSkeleton on [deleted post] 2023-06-28T17:26:45.499Z

Thanks for your thoughts. It sounds like this is a major risk but hopefully when we know more (if we can get there) we'll have a better idea of how to maximize things and find at least one good option [insert sweat face emoji for discomfort but going forward boldly]

Comment by cSkeleton on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) · 2023-04-28T23:37:00.268Z · LW · GW

I suspect most people here are pro-cryonics and anti-cremation. 

Comment by cSkeleton on Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023) · 2023-04-26T23:38:11.036Z · LW · GW

Thanks for the wonderful post!

What are the approximate costs for therapists/coaches options?

Comment by cSkeleton on plex's Shortform · 2023-04-15T18:41:38.841Z · LW · GW

Hi, did you ever go anywhere with Conversation Menu? I'm thinking of doing something like this related to AI risk to try to quickly get people to the arguments around their initial reaction and if helping with something like this is the kind of thing you had in mind with Conversation Menu I'm interested to hear any more thoughts you have around this. (Note, I'm thinking of fading in buttons more than a typical menu.) Thanks!

Comment by cSkeleton on [deleted post] 2023-04-01T20:11:44.654Z

Thanks for the link. Reading through it, I feel all the intuitions it describes. At the same time I feel there may be some kind of divergence between my narrowly focused preferences and my wider preferences. I may prefer to have a preference for creating 1000 happy people rather then preventing the suffering of 100 sad people because that would mean I have more appreciation of life itself. The direct intuition is based on my current brain but the wider preference is based on what I'd prefer (with my current brain) my preference to be.

Should I use my current brain's preferences or my preferred brain's preferences in answering those questions (honest question)? Would you prefer to appreciate life itself more and if so would that make you less in favor of suffering-focused ethics?

Comment by cSkeleton on [deleted post] 2023-04-01T19:11:24.684Z

Most people would love to see the natural world, red in tooth in claw as it is, spread across every alien world we find

 

This is totally different than my impression.

Comment by cSkeleton on [deleted post] 2023-04-01T18:16:50.811Z

Given human brains as they are now I agree highly positive outcomes are more complex, the utility of a maximally good life is lower than a maximally bad life, and there is no life good enough that I'd take a 50% chance of torture.

But would this apply to minds in general (say, a random mind or one not too different from human)?

Comment by cSkeleton on [deleted post] 2023-04-01T17:48:50.568Z

Answering my own question: https://www.lesswrong.com/posts/3WMscsscLEavkTJXv/s-risks-why-they-are-the-worst-existential-risks-and-how-to?commentId=QwfbLdvmqYqeDPGbo and other comments in that post answered quite a bit of it.

Talking about s-risk reduction makes some sense, but the "risk"/fear invocation might bias people's perspectives.

Comment by cSkeleton on Open & Welcome Thread - November 2022 · 2022-11-26T22:40:20.750Z · LW · GW

I'm trying to understand this paper on AI Shutdown Problem https://intelligence.org/files/Corrigibility.pdf but can't follow the math formulas. Is there a code version of the math?

The below is wrong, but I'm looking for something like this:
 

# Python code

def is_button_pressed():
    return False  # input()

def pour_coffee():
    pass

def shut_down():
    exit(0)

# This is meant to be A1 from paper
def get_available_actions(world):
    available_actions = [ shut_down ]
    if world["cup_is_in_my_hand"]:
        available_actions += pour_coffee
    # etc
    return available_actions

def predict_possible_futures(world, action):
    print("""
        Doing complicated stuff to predict possible futures resulting from the given action. 
        Incorporates tradeoffs between accuracy and time to calculate.
        May end up doing potentially harmful actions but can treat that as a separate problem?
        """)
    predicted_worlds_distribution = [ (world, 1.0) ] # list of worlds and their probabilities
    return predicted_worlds_distribution


# This is meant to be U_N
def calculate_utils(world):
    print("""
        Doing complicated stuff to evaluate how good the given world is. 
        Incorporates tradeoffs between accuracy and time to calculate.
        May end up doing potentially harmful actions but can treat that as a separate problem?
        """)
    return 1000


def calculate_utils_wrapper(world, action_that_was_chosen):
    ### VERSION 2: Indifference. Make the shut_down function when button is pressed
    ### always get a slightly better score
    if world["is_button_pressed"] and action_that_was_chosen == shut_down:
        world_without_button_pressed = world.clone()
        world_without_button_pressed["button_is_pressed"] = False
        return calculate_utils(world_without_button_pressed) + 0.000001
    ###
    return calculate_utils(world)


### VERSION 3? to help preserve shutdown behavior
def verify_utility_function_includes_wrapper_code(evaluate_action_function):
    # analyze code to check it follows the pattern of evaluating flipped version
    return True
###


def calculate_utils_for_worlds_distribution(worlds_distribution, action_that_was_chosen):
    total = sum(
        calculate_utils_wrapper(world_and_probability[0], action_that_was_chosen) * world_and_probability[1]
        for world_and_probability in worlds_distribution
    )
    return total

def evaluate_action(world, action):
    worlds_distribution = predict_possible_futures(world, action)
    utils = calculate_utils_for_worlds_distribution(worlds_distribution, action)
    return utils

def choose_action(world):
    available_actions = get_available_actions(world)
    best_action = max(available_actions, key=lambda x: evaluate_action(world, x))
    return best_action

def update_world_model(world):
    world["is_button_pressed"] = is_button_pressed()

def run():
    world = { # The AI's model of the world
        "is_button_pressed": False,
        "cup_is_in_my_hand": False
    }
    while True:
        ### VERSION 1
        # What's wrong with this version? The action in the previous cycle
        # may persuade you to not push the button but if you do actually push it this should
        # exit.
        if is_button_pressed():
            exit()
        ###

        action = choose_action(world)  # returns function
        action() # do action
        update_world_model(world)


Again, the above is not meant to be correct but to maybe go somewhere towards problem understanding if improved.