I think I'm just confused. Once a model exists, how do you "red-team" it to see whether it's safe. Isn't it already dangerous? 2023-11-18T14:16:26.536Z
Beginner's question about RLHF 2023-08-08T15:48:24.118Z
Random Observation on AI goals 2023-04-08T19:28:01.062Z
The alien simulation meme doesn't make sense 2023-02-24T19:27:11.916Z
I believe some AI doomers are overconfident 2022-12-20T17:09:23.325Z


Comment by FTPickle on I think I'm just confused. Once a model exists, how do you "red-team" it to see whether it's safe. Isn't it already dangerous? · 2023-11-18T21:51:45.916Z · LW · GW

Yes! I agree. I've been saying for a while that a possible outcome is a medium-danger model comes out that isn't that dangerous but convinces everyone that the potential for huge danger exists and we shut things down

Comment by FTPickle on I think I'm just confused. Once a model exists, how do you "red-team" it to see whether it's safe. Isn't it already dangerous? · 2023-11-18T18:05:05.206Z · LW · GW

I'm a beginner but I don't think this is right.

Pretend the model "wants" to kill us all.  You can't red-team it, can you?  Like imagine a model that wants to kill us all and you send me to see if it's dangerous.  If I come back and say "Yup, it's dangerous!  Shut it down" then it wasn't so dangerous in the first place.  

If I come back and say "No worries everyone-- it's fine!" and then three weeks later the world ends, then I guess it was dangerous

Comment by FTPickle on Mesa-Optimization: Explain it like I'm 10 Edition · 2023-08-26T23:28:59.258Z · LW · GW


Comment by FTPickle on Beginner's question about RLHF · 2023-08-26T16:46:40.366Z · LW · GW

Tyty very helpful and illuminating

Comment by FTPickle on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-02T18:55:33.167Z · LW · GW

Thank you for the response but I have to honestly report that the clue did not in fact help :)

Comment by FTPickle on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-02T18:53:12.368Z · LW · GW

Jesus what a great response sorry I didn't respond earlier I didn't see this until just now.  Thank you!

I'm interested in the second bullet-point in your "Only in SGD" section.  This is actually the crux of what I'm curious about.  I've read enough about evolution that I "get" how it works in that I can see that sometimes evolution goes down kind of weird paths because it is reliant on small changes.  You aren't going to turn a bacteria into a bird within x generations etc, and I can imagine a situation where something doesn't get hard-optimized because the path that ends up taken maybe wasn't perfect but evolution can't backtrack.

I feel like the details of how SGD differs, particularly how you point out it's continuous (I'm not sure I fully understand but what I'm reading is 'you can make wholesale changes across the entire agent without relying on individual genetic mutations') might actually matter a lot.  I'm guessing I couldn't learn more specifics without investing a lot of time though.

Anyways thanks for the great response it was useful to me :)

Comment by FTPickle on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-07-21T00:48:21.469Z · LW · GW

In what specific respects is gradient descent analogous/not-analogous to evolution?  I hear Eliezer Yudkowsky talk about this all the time-- I understand evolution and natural selection reasonably well but I have no background whatsoever in CS and can't understand his claims because of this.

Comment by FTPickle on AI #5: Level One Bard · 2023-04-02T11:47:24.048Z · LW · GW

You do a huge service to the world by writing these up.  Thank you!


Comment by FTPickle on Against AGI Timelines · 2023-03-13T14:31:08.746Z · LW · GW

Excellent article, in my view

Comment by FTPickle on Effective ways to find love? · 2022-12-29T16:54:11.684Z · LW · GW

Honestly, this sounds stupid, but I would start a regular meditation practice if you don't already have one.  Commit to spending fifteen minutes a day for a year and if you don't see any changes you can just drop it with no harm done.  

Don't expect anything to happen for awhile though; just do it every day w/ zero expectations.  My guess is within a few months you will notice positive changes in your life, including in love/relationships

Good luck whether you try the meditation or not :)

Comment by FTPickle on Staring into the abyss as a core life skill · 2022-12-22T15:56:43.294Z · LW · GW

This is so good.  Meditation has helped me more than anything else at staring into the abyss-- you are just there with the thoughts so they are harder to ignore.  It's amazing how much you can still ignore them though!

Comment by FTPickle on I believe some AI doomers are overconfident · 2022-12-21T17:43:57.941Z · LW · GW

It depends on what you mean by "go very badly" but I think I do disagree. 

Again, I don't know what I'm talking about, but "AGI" is a little too broad for me.  If you told me that you could more or less simulate my brain in a computer program and that this brain had the same allegiances to other AIs and itself that I currently have for other humans, and the same allegiance to humans that I currently have for even dogs (which I absolutely love), then yes I think it's all over and we die.

If you say to me, "FTPickle, I'm not going to define AGI.  It is a promise that in 2027 an AGI emerges.  Is it more likely than not that humanity is wiped out by this event?"  I would gulp and pick 'no.'

Difference between "plausible" and "likely" is huge I think. Again huge caveat that AGI may be more specifically defined than I am aware of.

Comment by FTPickle on I believe some AI doomers are overconfident · 2022-12-21T16:37:23.115Z · LW · GW

Yeah I totally agree with that article-- it's almost tautologically correct in my view, and I agree that the implications are wild

I'm specifically pushing back on the ppl saying it is likely that humanity ends during my daughter's lifetime--  I think that claim specifically is overconfident.  If we extend the timeline than my objection collapses.

Comment by FTPickle on I believe some AI doomers are overconfident · 2022-12-21T14:18:01.562Z · LW · GW

Hmmm.  I don't feel like I'm saying that.  This isn't the perfect analogy, but it's kind of like AI doomers are looking at an ecosystem and predicting that if you introduce wolves into the system the wolves will become overpopulated and crush everything.  There may be excellent reasons to believe this:

  1. Wolves are more powerful than any other animal
  2. They have a powerful hunting drive
  3. The other animals have never encountered wolves

etc etc

I just think that it's too complex to really feel confident, even if you have really excellent reasons to believe it will happen.  Maybe wolves do horribly on hills and we didn't know that before we let them loose in this environment etc.

It's not on me to come up with reasons why the wolves won't take over-- simply saying "it's incredibly complex and we shouldn't be too confident about this even though it seems reasonable" is enough in my view

Comment by FTPickle on I believe some AI doomers are overconfident · 2022-12-20T20:39:50.847Z · LW · GW

It's not symmetric in my view: The person positing a specific non-baseline thing has the burden of proof, and the more elaborate the claim, the higher the burden of proof.  

"AI will become a big deal!" faces fewer problems than "AI will change our idea of humanity!" faces fewer problems than "AI will kill us all!" faces fewer problems than "AI will kill us all with nanotechnology!"

Comment by FTPickle on What to do if a nuclear weapon is used in Ukraine? · 2022-10-24T17:49:02.725Z · LW · GW

tyty this makes sense

Comment by FTPickle on AGI in our lifetimes is wishful thinking · 2022-10-24T17:46:45.550Z · LW · GW

Thank you--  I love hearing pessimistic takes on this.

The only issue I'd take is I believe most people here are genuinely frightened of AI.  The seductive part I think isn't the excitement of AI, but the excitement of understanding something important that most other people don't seem to grasp.  

My main hope in terms of AGI being far off is that there's some sort of circle-jerk going on on this website where everyone is basing their opinion on everyone else, but everyone is basing it on everyone else etc etc

I mean obviously the arguments themselves are good and compelling and the true luminaries in the field have good reasons, but take for instance me.  I'm genuinely frightened of AGI and believe there is a ~10% chance my daughter will be killed by it before the end of her natural life, but honestly all of my reasons for worry boil down to "other smart people seem to think this."

Like, I get the arguments for AGI doom.  They make sense.  But the truth is if Eliezer Y came out tomorrow and said "holy shit I was wrong we don't have to worry at all because of the MHR-5554 module theorem" and then Nick Bostrom said "Yup!  Stop worrying everyone.  Thank you MHR-5554!  What a theorem!"  I would instantly stop worrying. 

I think (hope?) that many people on this site are in the same boat as me

Comment by FTPickle on What to do if a nuclear weapon is used in Ukraine? · 2022-10-19T20:54:05.432Z · LW · GW

I don't know anything about this topic.  My initial thought is "Well, maybe I'd move to Montana."  Why is this no good?

Comment by FTPickle on Life, Death, and Finance in the Cosmic Multiverse · 2022-10-17T13:54:38.292Z · LW · GW

Oh my god this is so great.  You may just be restating things that are obvious to anyone who studies and thinks about this stuff, but to me it is quite illuminating and I've only read a portion so far.  I bookmarked this into my "Awesome Reads" folder


Comment by FTPickle on All AGI safety questions welcome (especially basic ones) [Sept 2022] · 2022-09-08T14:52:56.313Z · LW · GW

From my limited understanding, one concern is that an AI will more or less think to itself, "Well, let's see.  I'm not currently powerful enough to overtake all humans, but I recognize that this should in fact be my ultimate goal.  I'm going to basically wait here until either I come up with a better plan, or things develop technologically such that I will in fact be able to kill them all.  For now, I'm going to keep hidden the fact that I'm thinking these thoughts.  The humans have no idea I'm up to this!"

If I have this right, my question is:

Do we know for sure that this isn't currently happening?  My guess is that things aren't developed enough technologically for this to be occurring, but could it be?  Is there a way to calculate the threshold at which something like this could in fact be happening?

Comment by FTPickle on All AGI safety questions welcome (especially basic ones) [July 2022] · 2022-07-16T18:49:42.438Z · LW · GW

I feel like I understand this topic reasonably well for a casual reader, and I'm trying to convince my friends that they should take the threat seriously and think about it.  I haven't moved the needle on any of them, which actually surprises me.  This isn't really so much a question as just putting out there This is usually where I get stuck when talking to bright people who haven't considered AGI before:

Them: OK but what is it going to do?

Me: Well I'm not totally sure, but if it's much more intelligent than us, whatever it will come up with could kill us.  Who knows, maybe a virus or something?  

Them: OK, but it has to physically DO something.  How will it do that?

Me: Again, I'm not sure, but surely it could easily convince someone on the planet to combine chemicals or whatever it needs?  Just an example.  Again, I'm not sure

Them: Right, but it has to kill us all at once.  It can't physically do anything.  The second we see something potentially cataclysmic happening, we will shut it down.

Me: Right, but it KNOWS that.  It won't strike until it knows that our shut-down counterattack won't work.

Them: OK, but how is it going to get to that state without DOING anything?  At some point, it needs to physically manipulate the world, and we are back to our last point.

Me: Surely something so intelligent will be able to come up with an effective plan.

Them: No way.  Have you ever tried to implement a plan in reality?  It never works, and now you're telling me that a superintelligence will be able to wait in the weeds until the exact right time when it burst out of hiding and kills all of humanity all at once?  You're crazy

Comment by FTPickle on What Are You Tracking In Your Head? · 2022-07-03T12:01:26.761Z · LW · GW

Cool post :)

Comment by FTPickle on Juneberry Cake · 2022-06-19T17:13:44.936Z · LW · GW

Twist: It's actually an AGI who made this post to lull me into one second spent on this god-forsaken website not gripped with fear and anti-AI sentiment.

Just kidding more juneberry content plz

Comment by FTPickle on AGI Ruin: A List of Lethalities · 2022-06-07T17:37:50.486Z · LW · GW

Grunch but

"Knowing that a medium-strength system of inscrutable matrices is planning to kill us, does not thereby let us build a high-strength system of inscrutable matrices that isn't planning to kill us."

Maybe if people become convinced of the first clause, people will start destroying GPUs or a war will start or something?  

Comment by FTPickle on My Terrible Experience with Terror · 2022-04-21T20:29:14.538Z · LW · GW

Beautiful :)

Comment by FTPickle on Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe · 2022-04-12T01:44:11.015Z · LW · GW
Comment by FTPickle on Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe · 2022-04-11T12:16:22.225Z · LW · GW
Comment by FTPickle on Can you only realize objects in your mind after they appear? · 2022-03-16T19:28:48.637Z · LW · GW

Holy crap love this

Comment by FTPickle on The case for "mental strength" · 2022-02-08T18:11:13.868Z · LW · GW

Long-term meditator here (~4400 total hours).  

I actually think you may have it backwards here: "In the mental realm, the opposite may be true: the average person may be experiencing a pretty thorough mental workout just from day-to-day life"

In my view, mental "exercise" actually requires an absence of stimulation.  This is increasingly difficult to find in the modern world, due to email, text, twitter etc.  

Also in my view this may be why so many people are complaining of burnout.  Boredom I believe may have benefits for mental health, and boredom is declining in our world

Just my two cents-- great piece :)  

Comment by FTPickle on What is difference between thoughts and consciousness? · 2021-08-19T21:38:26.210Z · LW · GW

One quick thing is to consider animals-- I bet my dog is conscious, but I'm not sure she has "thoughts" as we conceive of them.  

I bet you can have thoughts without consciousness though.  I'm imagining consciousness as something like a computer program.  The program is written such that various sub-modules probabilistically pitch "ideas" based on inputs from the environment, etc. ("Pay more attention to that corner of the room!" "Start running!")  Another module sort of probabilistically "evaluates" these ideas and either initiates behavior or not.  With enough complexity and inter-connectedness, consciousness emerges.

In theory, you could write a program where there are only a couple of commands that are being "evaluated."  These are thoughts, but the system isn't complex enough to be conscious.  

Obviously this is all massive bull-shitty conjecture and there a million caveats but that's my immediate reaction.

Thinking more about the animal example, I might say that you can't have consciousness without thoughts, but it depends how you define "thoughts".  So I bet that an iguana is conscious, but I bet its thoughts are mostly "ah fuck I hate how this feels" or "oooh nice I like how this feels".  Obviously not in words.  I would basically consider these to be thoughts, and as long as you define thoughts like that it seems impossible to have consciousness without thoughts.  My guess is that ppl who have taken drug trips would say consciousness can exist without thoughts, but I bet it's a semantic thing-- those trips might create a state of consciousness totally alien to me for instance but I bet it's still a "thought" given how widely I would define what a thought is.