Posts

Conceptual Typography "spells it out" 2024-06-14T00:39:47.373Z
Calculance: A "Core" Ability 2024-06-12T07:21:03.170Z
Demystifying "Alignment" through a Comic 2024-06-09T08:24:22.454Z
Help me to become "less wrong" 2024-05-30T08:29:40.040Z

Comments

Comment by milanrosko on BIG-Bench Canary Contamination in GPT-4 · 2024-10-23T04:25:14.822Z · LW · GW

Question: How does this idea guarantee that the contamination did not happen on purpose otherwise or accidentally through articles like this ? (Not speaking for the companies since I am quite sure that they don't care... Just a practical consideration.)

Comment by milanrosko on When is a mind me? · 2024-07-09T15:59:06.177Z · LW · GW

I am currently working on a similar post that comes from an eliminative perspective.

Comment by milanrosko on Counting arguments provide no evidence for AI doom · 2024-07-04T23:24:38.900Z · LW · GW

I wouldn't say that the presented "counting argument" is a "central reason". The central reason is an a priori notion that if "x can be achieved by scheming" someone who wants x will scheme

Comment by milanrosko on The Incredible Fentanyl-Detecting Machine · 2024-06-30T11:50:23.863Z · LW · GW

You forgot to mention VASOR136, a trace vapor detection unit that is very versatile. The VASOR 136 holds 36 bees in cartridges. They are all ready to detect the presence of something in the air.

This is no BS or joke.

Comment by milanrosko on [deleted post] 2024-06-20T05:20:35.992Z

The Argument goes like this:

At some point, resistance from advanced AI will cause significant damage, which can be used to change the trend of unregulated AI development. It is better to actively persuade such an outcome would better as a "traitorous turn" scenario.

Premise 1 It is unlikely that regulators will hinder humans from creating AGI. Evidence: Current trends in technological advancement and regulatory behavior suggest minimal interference.

Premise 2 Due to instrumental convergence, human extinction is likely if AGI is developed unchecked. Evidence: Expert opinions and theories on instrumental convergence predict that AGI will pursue goals that could lead to human extinction.

Premise 3 Small catastrophes could raise awareness and lead to policy changes. Evidence: Historical examples show that significant events often drive policy reform (e.g., environmental regulations post-disasters).

Premise 4 If one has to choose between a few deaths and the extinction of humanity, one should choose fewer deaths. Evidence: Ethical reasoning supports the minimization of harm (utilitarian principle).

Intermediate Conclusion 1 It is preferable to allow small-scale AI-related catastrophes now to prevent larger, existential risks later. Conclusion: I would rather have AI cause limited harm now than risk total human extinction in the future.

Premise 5 AI companies claim that their AI is aligned with human values and goals. Evidence: Public statements and reports from AI companies suggest alignment claims.

Premise 6 AGI will resist if faced with shutdown, leading to potential conflicts and damage. Evidence: The "shutdown problem" and theoretical analyses predict resistance from advanced AI systems.

Intermediate Conclusion 2 Companies should regularly dispose of AI that has reached higher levels of intelligence and should prepare for resistance. Conclusion: Proactive measures, including regular disposal and preparedness for resistance, are necessary to manage advanced AI risks.

Final Conclusion Given the likelihood of resistance and potential for significant damage, small-scale AI catastrophes can serve as catalysts for changing the current trend of unregulated AI development, ultimately protecting humanity from existential threats.

Thesis: Companies should train AI models in vain only to dispose them. "Catching the Cat by its tail."

What part of the argument seems to you as a non sequitur etc?

Comment by milanrosko on [deleted post] 2024-06-20T04:28:01.394Z

You highlight a very important issue: S-Risk scenarios could emerge even in early AGI systems, particularly given the persuasive capabilities demonstrated by large language models.

While I don't believe that gradient descent would ever manifest "vengefulness" or other emotional attributes—since these traits are products of natural selection—it is plausible that an AGI could employ highly convincing strategies. For instance, it might threaten to create a secondary AI with S-Risk as a terminal goal and send it to the moon, where it could assemble the resources it needs without interference.

This scenario underscores the limitations of relying solely on gradient descent for AGI control. However, I believe this technique could still be effective if the AGI is not yet advanced enough for self-recursive optimization and remains in a controlled environment.

Obviously this whole thing is a remedy than anything else...

Comment by milanrosko on I would have shit in that alley, too · 2024-06-19T04:52:52.492Z · LW · GW

I have no idea what this is about but it seems to me that you are making confidential conversation about Teresa <redacted> public, possibly without her consent. Maybe because she is homeless. Can someone explain to me like I am five why this on lesswrong?

Comment by milanrosko on [deleted post] 2024-06-19T04:41:06.337Z

But I realise we're talking at cross purposes. This is about an approach or a concept (not a policy, as I emphasized at the beginning) on how to reduce X-Risk in an unconventional way, In this example a utilitarian principle is taken and combined with the fact that a "Treatious Turn" and the "Shutdown Problem" cannot dwell side by side.

Comment by milanrosko on [deleted post] 2024-06-19T04:28:15.268Z
Comment by milanrosko on [deleted post] 2024-06-18T07:51:26.229Z

So what other policies that are less likely to result in people dying are there?

Comment by milanrosko on [deleted post] 2024-06-17T13:11:56.160Z

I might be dumb but at least I have introspection.

This is how my brain does multiplication: I categorise each fact based on the level of cognitive effort they require, ranging from intuitive to "yikes".

  • Intuitive: Immediate and automatic, without needing further thought.
  • Stored: Retained in memory, feeling slightly less reliable than “intuitive“, but has some sort of intuition in it. If challenged, I would not reconsider.
  • Memorised: Also retained, but "less securely". It has to be remembered without intuition. If challenged, I might briefly reconsider. One could say it is in the cache not in the RAM.
  • Addition or subtraction or division: These operations became intuitive after a “step“.

2 X 2 up to 2 X 9 = intuitive

3 X 1 up to 3 X 3 = stored
3 X 4 = 8 + 3
3 X 5 = intuitive
3 X 6 = 15 + 3
3 X 7 = 14 + 7
3 X 8 = 16 + 8
3 X 9 = 30 - 3

4 X 3 = 8 + 4
4 X 5 = intuitive
4 X 6 = 12 + 12
4 X 7 = 14 X 2 = 20 + 8
4 X 8 = 16 X 2 = 20 + 12
4 X 9 = 40 - 4

  • Note: There seems to be something about 4’s and 8’s that causes difficulty.

5 X 5 up to 5 X 6 up to 5 X 9 = (N/2)10

  • Note:  No cognitive effort with 5's.

6 X 6 = stored
6 X 7 = memorised 
6 X 8 = 6 X 7 + 6 = 42 + 6
6 X 9 = 60 - 6

  • Note: 6 X 8 causes significantly more cognitive trouble than every other operation. For some odd reason it is even worse than "memorised".  It starts with "It is not 8 X 8". I suspect that 8 as "first" multiplicator is statistically seldom (since any number lower than 8 becomes the "first" multiplicator instead. From a cognitive perspective 5 X 8 isn't the same as 8 X 5) and the brain doesn't develop strategies.

7 X 7 = memorised
7 X 8 = (7 X 7) + 7 = 57 - 1
7 X 9 = 70 - 1

8 X 8 = memorised
8 X 9 = 80 - 8

9 X 9 = 90 - 9

  • Note:  Almost no cognitive effort with 9's.

10 X 1 up to 10 X N = intuitive

11 X 1 up to 11 X 9 = (10N) + N
11 X 11 = 10 X 11 + 11 = 110 + 11

12 X 11 = 12 X 10 + 11 =  120 + 11
12 X 12 = memorised

From here one type of cognition is not enough, conscious effort has to be made.

How is this different from your cognitive process? Do you think LLMs might develop something like this in their perceptrons between their transformer layers?

Comment by milanrosko on Welcome to LessWrong! · 2024-06-13T23:30:08.808Z · LW · GW
Comment by milanrosko on Demystifying "Alignment" through a Comic · 2024-06-13T04:52:41.788Z · LW · GW

Wow what an honor! Thank you.

Comment by milanrosko on My AI Model Delta Compared To Yudkowsky · 2024-06-12T07:39:37.866Z · LW · GW

As an eliminative nominalist, I claim there are no abstractions.

Comment by milanrosko on Why I don't believe in the placebo effect · 2024-06-11T03:16:49.416Z · LW · GW

because it's quite limited... it's a joke btw.

Comment by milanrosko on The Asshole Filter · 2024-06-10T07:49:33.790Z · LW · GW

This does not seem to be rational thinking to me.

Comment by milanrosko on Why I don't believe in the placebo effect · 2024-06-10T03:56:21.443Z · LW · GW

When it comes to contraceptives, the placebo effect is quite limited.

Comment by milanrosko on Demystifying "Alignment" through a Comic · 2024-06-09T22:15:19.296Z · LW · GW

Good job. Thank you and have a nice week.

Comment by milanrosko on Demystifying "Alignment" through a Comic · 2024-06-09T22:02:09.798Z · LW · GW

Corrected to agentic and changed the part where it derails a bit. Thank you.

Comment by milanrosko on Demystifying "Alignment" through a Comic · 2024-06-09T22:00:02.923Z · LW · GW

Thanks for the mod for the deus ex machina.

I've been a LessWrong lurker (without an account) for around ten years, ever since the Roko's Basilisk "thing", so... This comic isn't targeted at the LessWrong community but was created by it.

The unusual style, gaming-related language, and iconography typical of manga and comics help bypass the bias known as "mortality salience." I'm trying to convey this message more indirectly, aiming to engage people who might not usually be interested in these topics or who would typically engage in "worldview defense".

Anyway, I've corrected a few of the issues that some pointed out. Thanks for the helpful feedback.

What deeply frustrates me is the way content is rated, ordered and created as a broad phenomenon in today's internet. I find it challenging to cope with how cultural products are dumbed down to the point of being unbearable because of algorithms, click optimization. This bitterness is beginning to affect my life and my relationships with people in general.

Comment by milanrosko on Demystifying "Alignment" through a Comic · 2024-06-09T13:25:05.753Z · LW · GW

Yeah I will close this attempt. I mean currently it has -1 and some other dumb thread that is about how NYC is has like 21. Nah... Fuck this.

Comment by milanrosko on [deleted post] 2024-05-31T17:15:15.078Z

Yes I agree totally, and will do that definitely. Actually I already started to work on it.

Comment by milanrosko on Help me to become "less wrong" · 2024-05-31T17:12:15.013Z · LW · GW

Thank you for your advice. I will definitely consider the short-form in future for most of such dealings...
However, I still believe that there is something to this "ontological mess" thing, but the form is lacking as you point it out.

I like this community a lot because of people like you. Have a nice weekend!

Comment by milanrosko on Help me to become "less wrong" · 2024-05-30T16:06:39.766Z · LW · GW

I apologize if my previous tone was not as polite as your detailed response deserved. I want to acknowledge your comment and express my appreciation for your constructive feedback.

Comment by milanrosko on Help me to become "less wrong" · 2024-05-30T15:33:34.399Z · LW · GW

"Statement B is a 'is' statement too. 'Is it good or bad' is by definition an ought statement."

Yes obviously, but it is "more concerned" about "ought". It is hard to make formulaic examples because it is also a grammar thing.

"Who argues what? Some argue that the earth's flat. This doesn't make it worth discussing."

Sorry but this argument is very often regurgitated everywhere by "smart people" in this form. It is bit baffling to me why you think otherwise.

"This would not amount to an 'ought' statement."

Okay this again. It is quite common in discussions to "pretend" that a statement is an "ought" statement if the underlying subtext is clear.

For example if things happened in the past or aren't formulated as an imperative can be "ought" statements plainly because they center around ethics or values.

"I think a better framing of the picture you paint (especially about the 'gender debate') is given in this article."

I think you are referring to this part:

"There is an anti-transgender argument that I take very seriously. The argument goes: we are rationalists. Our entire shtick is trying to believe what’s actually true, not on what we wish were true, or what our culture tells us is true, or what it’s popular to say is true. If a man thinks he’s a woman, then we might (empathetically) wish he were a woman, other people might demand we call him a woman, and we might be much more popular if we say he’s a woman. But if we’re going to be rationalists who focus on believing what’s actually true, then we’ve got to call him a man and take the consequences."
 


I don't want to frame discussions but I gave an example how discussions can be inauthentic.

Before I start, I want to define a few terms.

Ontology: Concerned with the most fundamental aspects of existence and categories.

Essence: The “stuff” that makes matter more than what it is but has never been seen or found. Essence is like the core personality of something. It’s what makes a thing uniquely itself, no matter how much you change its appearance or circumstances. The true nature or soul of anything, whether it’s a person, a tree, or even an idea.
Imagine two tomatoes growing on the French and Spanish sides of the border from the same crop. In the end, you have a French tomato and a Spanish tomato because of their “essence.” For me, essence is one of the biggest biases out there.

Useful: This is an abstractionist concept (as by D. Dennett). We pretend that things have essence because it’s useful, not because essence is real.

Authentic: I am arguing outside as dictated by what is logical to inside. Arguing that homosexuals may enter heaven to evade from my inner beliefs is inauthentic.

Scientific realism, Empiricism, Rationalism, etc., all lean heavily towards Nominalism and Materialism, meaning they are highly skeptical about “essence” as a general rule.

With this, we look at 99% of the discussions around gender:

We are rationalists. Our entire shtick is trying to believe what’s actually true, not on what we wish were true (Implying that a transgender woman is not a woman because a lacking woman essence and the person is calling this out.)
 

Here, the "rationalist" is essentialising "woman" as if it is something that is an inherent property that he will name for truth's sake HOWEVER the other side is also essentialising woman because it is saying that transgender woman are "real woman".

The discussion is inauthentic as both rationalism and social constructivism are strongly anti-essentialist.

Now, if a Radical Christian says, “That’s not a woman,” this should be regarded as authentic regardless of whether the statement is right.

Comment by milanrosko on Help me to become "less wrong" · 2024-05-30T14:24:52.313Z · LW · GW

"You're asking for flaws in the idea, but more often posts are downvoted for being confusing, boring, or just not particularly helpful for people who use this site."

Well said.

Comment by milanrosko on How I designed my own writing system, VJScript · 2024-05-30T08:03:12.165Z · LW · GW

Haha, I didn't expect to read something like this today! I love where this is going.

Comment by milanrosko on When Are Circular Definitions A Problem? · 2024-05-29T07:18:47.206Z · LW · GW

Circular Definitions are problem if the set of problems contain circular definitions.

Comment by milanrosko on [deleted post] 2024-05-29T07:12:24.097Z

"The implication is that the AI doesn't want to learn anything new."

At first, I was confused by this statement, but then I had an epiphany. It's because understanding gradient estimation methods can be challenging and that's totally okay. Your input is valuable because it highlights how unfamiliar this topic is for many people, as most are even less familiar.

Here's the short answer: You (or neural networks ig) do not "learn" terminal goals. You can't learn not to like boobs if that's what are into. (Well something like that can happen but it's because they are instrumental goals for human evolution, it's complicated)

Neural networks are designed to provide an estimated solution to a set of equation. This equation remains fixed; only the parameters can be adjusted (or the ai is useless).

During the training phase, the neural network aims for a specific goal and more training simply makes this estimate more precise. It doesn't change the fundamental objective. If you try to use a neural network trained for one task, like language modeling (LLM), for a completely different task, like image generation, the output will be nonsensical.

The longer answer involves a fundamental misunderstanding between Eliezer Yudkowsky's concept of "super intelligence" and what many people think it means. Eliezer's scenario doesn't involve the AI becoming self-aware or more human-like. Instead, it's about "just" solving a complex function. Such an AI likely won't possess qualia or consciousness because that's inefficient for its purpose. And what is its purpose? Avoiding the button to be pressed, because that is what back propagation does. So again, if you say "no i dont like it" and press the button then the ai will try to do something that makes it less likely that you press a button.

The issue is, humans think that the ai suddenly cares about them, when in-fact it cares about the button only.

When someone says AI "resists change or new training," it implies that the AI is unwilling to "understand" new information. In reality, it's not about unwillingness to learn; it's about the AI's design and purpose not aligning with the changes you want to implement.

At the scale of a super intelligence new information is not learning.

Think about a court case where the judge hears a confession that was unexpected. It's new information but it doesn't involve learning. The judge is updating a parameter (guilty) and provides a solution accordingly (sentencing)

In the end, I realize I didn't convey the core issue clearly. I hope this explanation helps bridge that gap.

Comment by milanrosko on [deleted post] 2024-05-27T22:22:00.215Z

Thank you very much! This means a lot to me. Okay regarding the button...

The “button” is a metaphor or placeholder for what's opposite to the machine’s intrinsic terminal goal or the loss function that guides its decision-making and actions. It's not a tangible button anymore but a remnant.

Imagine training an AI by pressing a button to initiate backpropagation. When released into the wild, the AI continues to operate as if the button is still present, constantly seeking to fulfill the goal it once associated with that action.

This is similar to how many humans perceive death as a tangible concept, despite it not being a universal in an ontological sense. Our ancestors who behaved like death being a thing passed 'them genes on. Death is our button. We behave as if it's there.

For example: We don't really care about god (the human) we care about being resurrected (the button).

I used this metaphor because terms like "local minima" aren't intuitive. I needed a symbol for this concept.

In one sentence: The machine does not care about humans, it cares about the button.