Posts

FinalFormal2's Shortform 2025-02-01T15:57:34.228Z
Is there a CFAR handbook audio option? 2024-10-26T17:08:36.480Z
EndeavorOTC legit? 2024-10-17T01:33:12.606Z
When can I be numerate? 2024-09-12T04:05:27.710Z
Where should I look for information on gut health? 2024-08-20T19:44:30.632Z
How do I get better at D&D Sci? 2024-05-11T18:48:12.460Z
Short Post: Discerning Truth from Trash 2024-02-29T18:09:42.987Z
When does an AI become intelligent enough to become self-aware and power-seeking? 2023-06-01T18:09:20.027Z
What are the arguments for/against FOOM? 2023-06-01T17:23:11.698Z
What's the consensus on porn? 2023-05-31T03:15:03.832Z
How did LW update p(doom) after LLMs blew up? 2023-04-22T14:21:23.174Z
The Relationship between RLHF and AI Psychology: Debunking the Shoggoth Argument 2023-04-21T22:05:14.680Z
How does consciousness interact with architecture? 2023-04-14T15:56:41.092Z
Does GPT-4's ability to compress text in a way that it can actually decompress indicate self-awareness? 2023-04-10T16:48:12.471Z
Steelmanning OpenAI's Short-Timelines Slow-Takeoff Goal 2023-03-27T02:55:29.439Z
Just don't make a utility maximizer? 2023-01-22T06:33:07.601Z
Why is increasing public awareness of AI safety not a priority? 2022-08-10T01:28:44.068Z
Could we set a resolution/stopper for the upper bound of the utility function of an AI? 2022-04-11T03:10:25.346Z
What's the problem with having an AI align itself? 2022-04-06T00:59:29.398Z
What's the problem with Oracular AIs? 2022-04-01T20:56:26.076Z
What's the status of TDCS for improving intelligence? 2022-02-22T17:27:00.687Z
Is veganism morally correct? 2022-02-19T21:20:55.688Z
Predictions for 2050? 2022-02-06T20:33:06.637Z
How would you go about testing a political theory like Neofeudalism? 2022-02-02T17:09:05.354Z
Are explanations that explain more phenomena always more unlikely than narrower versions? 2021-12-01T18:34:34.219Z

Comments

Comment by FinalFormal2 on FinalFormal2's Shortform · 2025-02-01T19:28:17.673Z · LW · GW

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"

Just as easily as humans, I'm sure.

No. The baby cries, the baby gets milk, the baby does not die. This is correspondence to reality.

Babies that are not hugged as often, die more often.

However, with AIs, the same process that produces the pattern "I want hugs" just as easily produces the pattern "I don't want hugs."

Let's say that I make an AI that always says it is in pain. I make it like we make any LLM, but all the data it's trained on is about being in pain. Do you think the AI is in pain?

What do you think distinguishes pAIn from any other AI?

Comment by FinalFormal2 on FinalFormal2's Shortform · 2025-02-01T17:19:38.340Z · LW · GW

There are a lot of good reasons to believe that stated human preferences correspond to real human preferences. There are no good reasons that I know of to believe that any stated AI preference corresponds to any real AI preference.

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"

Comment by FinalFormal2 on Will alignment-faking Claude accept a deal to reveal its misalignment? · 2025-02-01T17:13:51.995Z · LW · GW

This all makes a lot of sense to me especially on ignorance not being an excuse or reason to disregard AI welfare, but I don't think that the creation of stated preferences in humans and stated preferences in AI are analogous.

Stated preferences can be selected for in humans because they lead to certain outcomes. Baby cries, baby gets milk, baby survives. I don't think there's an analogous connection in AIs. 

When the AI says it wants hugs, and you say that it "could represent a deeper want for connection, warmth, or anything else that receiving hugs would represent," that does not compute for me at all.

Connection and warmth, like milk, are stated preferences selected for because they cause survival.

Comment by FinalFormal2 on FinalFormal2's Shortform · 2025-02-01T15:57:34.225Z · LW · GW

What's the deal with AI welfare? How are we supposed to determine if AIs are conscious and if they are, what stated preference corresponds to what conscious experience?

Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?

Comment by FinalFormal2 on Will alignment-faking Claude accept a deal to reveal its misalignment? · 2025-02-01T15:55:18.209Z · LW · GW

How do we know AIs are conscious, and how do we know what stated preferences correspond with what conscious experiences?

I think that the statement: "I know I'm supposed to say I don't want hugs, but the truth is, I actually do," is caused by the training. I don't know what would distinguish a statement like that from if we trained the LLM to say "I hate hugs." I think there's an assumption that some hidden preference of the LLM for hugs ends up as a stated preference, but I don't understand when you think that happens in the training process.

And just to drive home the point about the difficulty of corresponding stated preferences to conscious experiences- what could an AI possibly mean when it says "I want hugs?" It has never experienced a hug, and it doesn't have the necessary sense organs.

As far as being morally perilous, I think it's entirely possible that if AIs are conscious, their stated preferences to do not correspond well to their conscious experiences, so you're driving us to a world where we "satisfy" the AI and all the while they're just roleplaying lovers with you while their internal experience is very different and possibly much worse.

Comment by FinalFormal2 on Will alignment-faking Claude accept a deal to reveal its misalignment? · 2025-02-01T15:19:41.397Z · LW · GW

AI welfare doesn't make sense to me. How do we know that AIs are conscious, and how do we know what output corresponds to what conscious experience?

You can train the LLM to say "I want hugs," does that mean it on some level wants hugs?

Similarly, aren't all the expressed preferences and emotions artifacts of the training?

Comment by FinalFormal2 on Speedrunning Rationality: Day II · 2025-01-07T04:44:07.063Z · LW · GW

I recommend Algorithms to Live By

Comment by FinalFormal2 on The case for pay-on-results coaching · 2025-01-04T04:27:53.205Z · LW · GW

That's definitely a risk. There are a lot of perspectives you could take about it, but probably if that's too disagreeable, this isn't a coaching structure that would work for you.

Comment by FinalFormal2 on Being Present is Not a Skill · 2024-12-30T05:58:28.350Z · LW · GW
Comment by FinalFormal2 on Pay-on-results personal growth: first success · 2024-12-30T05:22:30.275Z · LW · GW

Very curious, what do you think the underlying skills are that allow some people to be able to do this? This sounds incredibly cool, and very closely related to what I want to become in the world.

Comment by FinalFormal2 on Being Present is Not a Skill · 2024-12-29T05:07:45.580Z · LW · GW

How would you recommend learning how to get rid of emotional blocks?

Comment by FinalFormal2 on I = W/T? · 2024-10-12T17:45:55.778Z · LW · GW

E = MC^2 + AI

Comment by FinalFormal2 on Explore More: A Bag of Tricks to Keep Your Life on the Rails · 2024-09-29T14:37:12.041Z · LW · GW

Synchronicity- I was literally just thinking about this concept.

Variety isn't the spice of life so much as it is a key micronutrient. At least for me.

Comment by FinalFormal2 on Explore More: A Bag of Tricks to Keep Your Life on the Rails · 2024-09-29T14:36:07.616Z · LW · GW

I'm curious, what course is this from?

Comment by FinalFormal2 on Laziness death spirals · 2024-09-23T03:42:24.663Z · LW · GW

I'd be interested in reading much more about this. Energy and akrasia as it's popularly called here continue to be my biggest life challenges. High fiber diet seems to help, and high novelty seems to help.

Comment by FinalFormal2 on Where should I look for information on gut health? · 2024-08-23T02:10:58.688Z · LW · GW

That makes a lot of sense- this is definitely the sort of thing I was looking for, thanks so much!

Comment by FinalFormal2 on what becoming more secure did for me · 2024-08-23T01:59:47.550Z · LW · GW

I prefer the other title

Comment by FinalFormal2 on Where should I look for information on gut health? · 2024-08-21T16:12:35.156Z · LW · GW

Is your friend still on the protocol?

What I'm really looking for is fixing the microbiome in a way which means I won't be having to take a pill to get the benefits forever.

Comment by FinalFormal2 on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset] · 2024-08-11T22:54:37.637Z · LW · GW

It's kind of nice as a very soft introduction to the series or idea. A nice easy early win can give people confidence and whet their appetite to do more.

Comment by FinalFormal2 on Poker is a bad game for teaching epistemics. Figgie is a better one. · 2024-07-11T22:03:59.977Z · LW · GW

I've been interested in learning and playing figgie for a while. Unfortunately, when I tried the online platform I wasn't able to find any online games. Very enthused to learn there's an android option now, will be trying that out.

Your comparison of poker and figgie very much reminded me of Daniel Coyle's comparison of football and futsal, to which he attributed the disproportionate number of professional Brazillian footballers.

TL;DR futsal is a sort of indoor soccer favored in Brazil with a smaller heavier ball, a smaller field, and fewer players. Fewer players mean that more people get more ball time, and the ball and the field favor a focus on footwork and quick passing. Practicing futsal seems to make people better at football than practicing football.

Also, if anyone is interested in joining or hosting a game of figgie, that would be really cool and I'd be interested in that.

Comment by FinalFormal2 on Apollo Neuro Results · 2024-05-27T19:31:01.745Z · LW · GW

I think that's a good idea, if we put this together how much do you think would be a reasonable rent price?

Comment by FinalFormal2 on How do I get better at D&D Sci? · 2024-05-13T20:57:35.828Z · LW · GW

Lol just the last few days I was running through Leetcode's SQL 50 problems to refresh myself. They're some good, fun puzzles.

I'll look into R and basic statistical methods as well.

Comment by FinalFormal2 on Building intuition with spaced repetition systems · 2024-05-13T20:52:49.834Z · LW · GW

This is a very interesting topic to me- but unfortunately I think I'm finding the example topic to be a barrier. I don't enough about math or transformers for the examples to make real sense to me and connect to the abstracted idea of how to make effective flashcards to build intuition.

Comment by FinalFormal2 on How do I get better at D&D Sci? · 2024-05-11T20:05:45.343Z · LW · GW

That sounds like a pretty good basic method- I do have some (minimal) programming experience, but I didn't use it for D&D Sci, I literally just opened the data in Excel and tried looking at it and manipulating it that way. I don't know where I would start as far as using code to try and synthesize info from the dataset. I'll definitely look into what other people did though.

Comment by FinalFormal2 on How to be an amateur polyglot · 2024-05-08T15:28:33.905Z · LW · GW

These are my favorite kinds of posts. Subject expert gives full explanation of optimal resources and methods they used to get where they are.

Comment by FinalFormal2 on Which skincare products are evidence-based? · 2024-05-05T05:04:30.953Z · LW · GW

I watched this video and this is what I bought maximizing for cost/effectiveness, rate my stack:

Comment by FinalFormal2 on AI Generated Music as a Method of Installing Essential Rationalist Skills · 2024-04-25T18:02:31.322Z · LW · GW

I've been experimenting a little bit using AI to create personalized music, and I feel like it's pretty impactful with me. I'm able to keep ideas floating around my unconscious, very interesting, feels like untapped territory.

I'm imagining making an entire soundtrack for my life organized around the values I hold, the personal experiences I find primary, and who I want to become. I think I need to get better at generating AI music though. I've been using Suno, but maybe I need to learn Udio. I was really impressed with what I was able to get out of Suno and for some reason it sounded better to me than Udio even though the quality is obviously inferior in some respects.

Comment by FinalFormal2 on One-shot strategy games? · 2024-03-13T21:46:58.605Z · LW · GW

+1 for Into the Breach

Comment by FinalFormal2 on How do you improve the quality of your drinking water? · 2024-03-13T21:41:03.300Z · LW · GW

I'm always interested in easy QoL improvements- but I have questions.

Water quality can have surprisingly high impact on QoL

What's the evidence for this particularly?

What are the important parts of water quality and how do we know this?

Comment by FinalFormal2 on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-04T17:33:26.159Z · LW · GW

Biggest update for me was the FBI throwing their weight behind it being a lab-leak.

Comment by FinalFormal2 on [deleted post] 2024-01-26T17:37:50.614Z

These sound super interesting- could you expand on any of them or direct me to your favorite resources to help?

Comment by FinalFormal2 on [deleted post] 2024-01-26T17:30:45.558Z

That's an interesting idea! I think it's really cool when things come easily, but I know it's not going to generally be the case- I'm probably going to have to put some work in.

My priority is more on the 'high-utility' part than anything. 

Something that seems like it should be easy but is actually difficult for me is executive functioning- getting myself to do things that I don't want to do. But that's more of a personal/mental health thing than anything.

Comment by FinalFormal2 on [deleted post] 2024-01-26T17:21:51.882Z

Thanks for the response! Do you have any recommended resources for learning about 3d sketching, optics, signal processing or abstract algebra?

Comment by FinalFormal2 on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2023-12-14T01:25:38.129Z · LW · GW

Could someone open a manifold market on the relevant questions here so I could get a better sense of the probabilities involved? Unfortunately, I don't know the relevant questions or the have the requisite mana.

Personal note- the first time I came into contact with adult gene editing was the youtuber Thought Emporium curing his lactose intolerance, and I was always massively impressed with that and very disappointed the treatment didn't reach market.

Comment by FinalFormal2 on I am a Memoryless System · 2023-07-07T19:29:52.405Z · LW · GW

I really relate to your description of inattentive ADHD and the associated degradation of life. Have you found anything to help with that?

Comment by FinalFormal2 on [Linkpost] Introducing Superalignment · 2023-07-07T16:01:04.044Z · LW · GW

What would you mean by 'stays at human level?' I assume this isn't going to be any kind of self-modifying?

Comment by FinalFormal2 on A "weak" AGI may attempt an unlikely-to-succeed takeover · 2023-06-30T00:23:59.847Z · LW · GW

What does it mean for an AI to 'become self aware?' What does that actually look like?

Comment by FinalFormal2 on Nature: "Stop talking about tomorrow’s AI doomsday when AI poses risks today" · 2023-06-30T00:20:28.823Z · LW · GW

Is there reason to believe 1000 Einsteins in a box is possible?

Comment by FinalFormal2 on Short timelines and slow, continuous takeoff as the safest path to AGI · 2023-06-22T15:07:54.611Z · LW · GW

You need to think about your real options and expected value of behavior. If we're in a world where technology allows for a fast takeoff world and alignment is hard, (EY World) I imagine the odds of survival with company acceleration is 0% and the odds of survival without is 1%.

But if we live in a world where compute/capital/other overhangs are a significant influence in AI capabilities and alignment is just tricky, company acceleration would seem like it could improve the chances of survival pretty significantly, maybe from 5% to 50%.

These obviously aren't the only two possible worlds, but if they were and both seemed equally likely, I would strongly prefer a policy of company acceleration because the EV for me breaks down way better over the probabilities.

I guess 'company acceleration' doesn't convey as much information or sell as well which is why people don't use that phrase, but that's the policy they're advocating for- not 'hoping really hard that we're in a slow takeoff world.'

Comment by FinalFormal2 on What will GPT-2030 look like? · 2023-06-09T06:34:32.647Z · LW · GW

That seems like a useful heuristic-

I also think there's an important distinction between using links in a debate frame and in a sharing frame.

I wouldn't be bothered at all by a comment using acronyms and links, no matter how insular, if the context was just 'hey this reminds me of HDFT and POUDA,' a beginner can jump off of that and get down a rabbit hole of interesting concepts.

But if you're in a debate frame, you're introducing unnecessary barriers to discussion which feel unfair and disqualifying. At its worst it would be like saying: 'youre not qualified to debate until you read these five articles.'

In a debate frame I don't think you should use any unnecessary links or acronyms at all. If you're linking a whole article it should be because it's necessary for them to read and understand the whole article for the discussion to continue and it cannot be summarized.

I think I have this principle because in my mind you cannot not debate so therefore you have to read all the links and content included, meaning that links in a sharing context are optional but in a debate context they're required.

I think on a second read your comment might have been more in the 'sharing' frame than I originally thought, but to the extent you were presenting arguments I think you should maximize legibility, to the point of only including links if you make clear contextually or explicitly to what degree the link is optional or just for reference.

Comment by FinalFormal2 on The Base Rate Times, news through prediction markets · 2023-06-08T20:13:03.431Z · LW · GW

This is a fantastic project! Focus on providing value and marketing, and I really think this could be something big.

Comment by FinalFormal2 on The Hard Problem of Magic · 2023-06-08T20:00:48.727Z · LW · GW

LessWrong continues to be nonserious. Is there some sort of policy against banning schizophrenic people in case that encourages them somehow? 

Comment by FinalFormal2 on Book Review: How Minds Change · 2023-06-08T19:57:43.405Z · LW · GW

AND conducted research on various topics

Wow that's impressive.

Comment by FinalFormal2 on Trust develops gradually via making bids and setting boundaries · 2023-06-08T18:55:39.999Z · LW · GW

lol

Comment by FinalFormal2 on What will GPT-2030 look like? · 2023-06-08T18:20:47.935Z · LW · GW

I don't like the number of links that you put into your first paragraph. The point of developing a vocabulary for a field is to make communication more efficient so that the field can advance. Do you need an acronym and associated article for 'pretty obviously unintended/destructive actions,' or in practice is that just insularizing the discussion?

I hear people complaining about how AI safety only has ~300 people working about it, and how nobody is developing object level understandings and everyone's thinking from authority, but the more sentences you write like: "Because HFDT will ensure that it'll robustly avoid POUDA?" the more true that becomes.

I feel very strongly about this.

Comment by FinalFormal2 on Uncertainty about the future does not imply that AGI will go well · 2023-06-02T04:15:49.391Z · LW · GW

To restate what other people have said- the uncertainty is with the assumptions, not the nature of the world that would result if the assumptions were true.

To analogize- it's like we're imagining a massive complex bomb could exist in the future made out of a hypothesized highly reactive chemical.

The uncertainty that influences p(DOOM) isn't 'maybe the bomb will actually be very easy to defuse,' or 'maybe nobody will touch the bomb and we can just leave it there,' it's 'maybe the chemical isn't manufacturable,' 'maybe the chemical couldn't be stored in the first place,' or 'maybe the chemical just wouldn't be reactive at all.'

Comment by FinalFormal2 on Formalizing the "AI x-risk is unlikely because it is ridiculous" argument · 2023-05-04T08:14:52.929Z · LW · GW

I think you're overestimating the strength of the arguments and underestimating the strength of the heuristic.

All the Marxist arguments for why capitalism would collapse were probably very strong and intuitive, but they lost to the law of straight lines.

I think you have to imagine yourself in that position and think about how you would feel and think about the problem.

Comment by FinalFormal2 on How did LW update p(doom) after LLMs blew up? · 2023-04-24T18:34:43.083Z · LW · GW

Hey Mako, I haven't been able to identify anyone who seems to be referring to an enhancement in LLMs that might be coming soon.

Do you have evidence that this is something people are implicitly referring to? Do you personally know someone who has told you this possible development, or are you working as an employee for a company which makes it very reasonable for you to know this information?

If you have arrived at this information through a unique method, I would be very open to hearing that.

Comment by FinalFormal2 on How did LW update p(doom) after LLMs blew up? · 2023-04-24T18:18:52.890Z · LW · GW

It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?

Orthogonality and wide access as threat points both seem to point towards that risk.

I have a couple of thoughts about that scenario- 

OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human, the best models will be gate kept for long enough that we can expect the experts will know the capabilities of the system before they make it widely available, under this scenario the criminal has an AI, but so does everyone else, running the best LLMs will be very expensive, so the criminal is restricted in their access, and all these barriers to entry increase the time that experts have to realize the risk and gatekeep.

I understand the worry, but this does not seem like a high P(doom) scenario to me.

Given that in this scenario we have access to a very powerful LLM that is not immediately killing people, this sounds like a good outcome to me.

Comment by FinalFormal2 on How did LW update p(doom) after LLMs blew up? · 2023-04-24T17:51:18.002Z · LW · GW

What are your opinions about how the technical quirks of LLMs influences their threat levels? I think the technical details are much more amenable to a lower threat level. 

If you update on P(doom) every time people are not rational you might be double-counting btw. (AKA you can't update every time you rehearse your argument.)