Posts

Training of superintelligence is secretly adversarial 2024-02-07T13:38:13.749Z
There is no sharp boundary between deontology and consequentialism 2024-01-08T11:01:47.828Z
Where Does Adversarial Pressure Come From? 2023-12-14T22:31:25.384Z
Predictable Defect-Cooperate? 2023-11-18T15:38:41.567Z
They are made of repeating patterns 2023-11-13T18:17:43.189Z
How to model uncertainty about preferences? 2023-03-24T19:04:42.005Z
What literature on the neuroscience of decision making can you recommend? 2023-03-16T15:32:17.052Z
What specific thing would you do with AI Alignment Research Assistant GPT? 2023-01-08T19:24:26.221Z
Are there any tools to convert LW sequences to PDF or any other file format? 2022-12-07T05:28:26.782Z
quetzal_rainbow's Shortform 2022-11-20T16:00:03.046Z

Comments

Comment by quetzal_rainbow on AI Regulation is Unsafe · 2024-04-26T15:53:11.853Z · LW · GW

May I strongly recommend that you try to become a Dark Lord instead? 

I mean, literally. Stage some small bloody civil war with expected body count of several millions, become dictator, provide everyone free insurance coverage for cryonics, it will be sure more ethical than 10% of chance of killing literally everyone from the perspective of most of ethical systems I know.

Comment by quetzal_rainbow on LLMs seem (relatively) safe · 2024-04-26T12:39:06.033Z · LW · GW

The reason why EY&co were relatively optimistic (p(doom) ~ 50%) before AlphaGo was their assumption "to build intelligence, you need some kind of insight in theory of intelligence". They didn't expect that you can just take sufficiently large approximator, pour data inside, get intelligent behavior and have no idea about why you get intelligent behavior.

Comment by quetzal_rainbow on LLMs seem (relatively) safe · 2024-04-26T08:58:52.142Z · LW · GW

General meta-problem of such discussions is that direct counterargument to "LLMs are safe" is to tell how to make LLM unsafe, and it's not a good practice.

Comment by quetzal_rainbow on AI Regulation is Unsafe · 2024-04-25T19:33:40.903Z · LW · GW

governments being worse at alignment than companies would have been

How exactly absence of regulation prevents governments from working on AI? Thanks to OpenAI/DeepMind/Anthropic, possibility of not attracting government attention at all is already lost. If you want government to not do bad work on alignment, you should prohibit government to work on AI using, yes, government regulations.

Comment by quetzal_rainbow on Is being a trans woman (or just low-T) +20 IQ? · 2024-04-25T12:40:10.307Z · LW · GW

Whoops, it's really looks like I imagined this claim to be backed more than by one SSC post. In my defense I say that this poll covered really existing thing like abnormal illusions processing in schizophrenics (see "Systematic review of visual illusions schizophrenia" Costa et al., 2023) and I think it's overall plausible.

My general objections stays the same: there is a bazillion sources on brain differences in transgender individuals, transgenderism is likely to be a brain anomaly, we don't need to invoke "testosterone damage" hypothesis.

Comment by quetzal_rainbow on Is being a trans woman (or just low-T) +20 IQ? · 2024-04-25T10:26:32.477Z · LW · GW

I don't understand why you need to invoke testosterone. Transgender brain is special, for example, transgender women have immunity to visual illusions. Anecdotally, I have friends with gender identity problems who do not make gender transition because it's costly and they don't have it this hard, they are STEM-level smart and they are not susceptible to visual illusions. So, assuming that this phenomenon exists (I don't quite believe your twitter statistics), it's likely explainable by transwomen innate brain structure.

The other weirdness in your hypothesis is that puberty blockers is a quite recent therapy and it's not ubiquous - most intellectually accomplished transwomen are likely to have standard male puberty. Even low-T male have mindboggingly large amount of testosterone compared to female, which implies really weird dose-dependency between testosterone and IQ in puberty.

There are plenty of stupid and/or distracting behaviors testosterone can push you for without any kind of "chemical brain damage", not only sex. Testosterone is likely to make you seek social status and status-seeking is notoriously incompatible with intellectual pursuits. I don't know my testosterone levels, but I have plenty of concussions due to my tastes for physical activity and I consider myself pretty average, stereotypical male. I suspect that concussions is the first direct source of male brain deterioration and testosterone is related here because it induces risk-seeking. The second and third, I think, smoking and drinking, and non-surpisingly, it's another sort of typical risky teenage male activity.

Comment by quetzal_rainbow on lukehmiles's Shortform · 2024-04-24T14:08:11.934Z · LW · GW

It's really weird hypothesis because DHT is used as nootropic.

I think the most effect of high T, if it exists, is purely behavioral.

Comment by quetzal_rainbow on When is a mind me? · 2024-04-19T10:21:54.086Z · LW · GW

I always thought that in naive MWI what matters is not whether something happens in absolute sense, but what Born measure is concentrated on branches that contain good things instead of bad things.

Comment by quetzal_rainbow on AI #60: Oh the Humanity · 2024-04-18T16:01:33.134Z · LW · GW

Timothy Lee struggles to ground out everything in the real world.

Timothy Lee: The last year has been a lot of cognitive dissonance for me. Inside the AI world, there’s non-stop talk about the unprecedented pace of AI improvement. But when I look at the broader economy, I struggle to find examples of transformative change I can write about.

 

Electricity wasn't in wide industrial usage until 1910s, despite technology being very promising from the start. The reason was differenct infrastructure necessary for steam-powered and electric factories. 

I think the same with LLMs: you need specific wrapping and/or experience to make them productive, this wrappings are hard to scale, so most of surplus is going to dissipate into consumer surplus + rise of income of productive workers.

The simplest (in conceptual sense) way to integrate AI in economy is to make it self-integrating, i.e. instead of having humans thinking which input AI need to get and where output will be directed, you should have AI agent which decides for itself.

Comment by quetzal_rainbow on lukehmiles's Shortform · 2024-04-18T12:50:31.812Z · LW · GW

I mean, the problem is if it works we won't hear about such people - they just live happily ever after and don't talk about uncomfortable period of their life.

Comment by quetzal_rainbow on shortplav · 2024-04-16T09:42:22.141Z · LW · GW

Another constraint is from computational complexity; should we treat things that are not polynomial-time computable as basically unknowable? Humans certainly can't solve NP-complete problems efficiently.

Generalized chess is EXPTIME-complete and while chess "exact solution" may be unavailable, we are pretty good at constructing chess engines.

Comment by quetzal_rainbow on Alexander Gietelink Oldenziel's Shortform · 2024-04-15T09:01:58.781Z · LW · GW

When I read word "bargaining" I assume that we are talking about entities that have preferences, action set, have beliefs about relations between actions and preferences and exchange information (modulo acausal interaction) with other entities of the same composition. Like, Kelly betting is good because it equals to Nash bargaining between versions of yourself from inside different outcomes and this is good because we assume that you in different outcomes are, actually, agent with all arrtibutes of agentic system. Saying "systems consist of parts, this parts interact and sometimes result is a horrific incoherent mess" is true, but doesn't convey much of useful information.

Comment by quetzal_rainbow on Alexander Gietelink Oldenziel's Shortform · 2024-04-14T21:08:53.390Z · LW · GW

I feel like the whole "subagent" framework suffers from homunculus problem: we fail to explain behavior using the abstraction of coherent agent, so we move to the abstraction of multiple coherent agents, and while it can be useful, I don't think it displays actual mechanistic truth about minds.

When I plan something and then fail to execute plan it's mostly not like "failure to bargain". It's just when I plan something I usually have good consequences of plan in my imagination and this consequences make me excited and then I start plan execution and get hit by multiple unpleasant details of reality. Coherent structure emerges from multiple not-really-agentic pieces.

Comment by quetzal_rainbow on Tamsin Leake's Shortform · 2024-04-13T20:32:28.382Z · LW · GW

It doesn't matter? Like, if your locations are identical (say, simulations of entire observable universe and you never find any difference no matter "where" you are), your weight is exactly the weight of program. If you expect dfferences, you can select some kind of simplicity prior to weight this differences, because there is basically no difference between "list all programs for this UTM, run in parallel".

Comment by quetzal_rainbow on Partial value takeover without world takeover · 2024-04-13T10:19:55.259Z · LW · GW

Okay, we have wildly different models of tech tree. In my understanding, to make mind uploads you need Awesome Nanotech and if you have misaligned AIs and not-so-awesome nanotech it's sufficient to kill all humans and start to disassemble Earth. The only coherent scenario that I can imagine misaligned AIs actually participating in human economy in meaningful amounts is scenario where you can't design nanotech without continent-sized supercomputers.

Comment by quetzal_rainbow on Viliam's Shortform · 2024-04-12T08:38:10.366Z · LW · GW

But it still feels that the lesson could be summarized as: "talk like everyone outside the rationalist community does all the time".

If non-rationalist people knew it all along, there wouldn't be need to write such books.

On the other hand, I think if average rationalist person tries to say speech from pure inspiration, the result is going to be weird. Like, for example, speech of HJPEV before the first battle. HJPEV got away with this, because he has reputation of Boy Who Lived and he already pulled some awesome shenanigans, so his weird speech got him weirdness points instead of losing them, but it's not the trick average rationalist should try on first attempt to say inspiring speech.

Comment by quetzal_rainbow on Open Thread Spring 2024 · 2024-04-12T07:49:09.101Z · LW · GW

It's kinda ill-formed question, because you can get the same performance if you compute moves longer with lower power. I guess you are searching for something like"energy per move".

Comment by quetzal_rainbow on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-09T09:11:46.964Z · LW · GW

You somehow managed to misunderstand me in completely opposite direction. I'm not talking about size of the universe, I'm talking about complexity of description of the universe. Description of the universe consists of initial conditions and laws of evolution. The problem with hidden variables hypotheses is that they postulate initial conditions of enormous complexity (literally, they postulate that at the start of the universe list of all coordinates and speeds of all particles exists) and then postulate laws of evolution that don't allow to observe any differences between these enourmously complex initial conditions and maximum-entropy initial conditions. Both are adding complexity, but hidden variables contain most of it.

Comment by quetzal_rainbow on [deleted post] 2024-04-09T08:23:36.598Z

I'm talking about probabilities. Aligned AIs want things that we value in 100% cases by definition. Unaligned AIs can want things that we value and things that we don't value at all. Even if we live in very rosy universe where unaligned AIs want things that we value in 99% of cases, 99% is strictly less than 100%.

My general objection was to the argumentation based on likelihood of consciousness in AIs as they developed without accounting for "what conscious AIs actually want to do with their consciousness", which can be far more important because the feature of intelligence is the ability to turn unlikely states into likely.

Comment by quetzal_rainbow on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-09T07:44:25.930Z · LW · GW

The general problem with "more intuitive metaphysics" is that your intuition is not my intuition. My intuition finds zero problem with many worlds interpretation.

And I think you underestimate complexity issues. Many worlds interpretation requires as many information as all wave functions contain, but pilot wave requires as many information as required to describe speed and position of all particles compatible with all wave functions, which for universe with 10^80 particles requires c*10^80, c>=1 additional bits, which drives Solomonoff probability of pilot wave interpretation somewhere into nothing.

Comment by quetzal_rainbow on [deleted post] 2024-04-08T21:01:01.548Z

The reason why unaligned AIs are more likely to be unconscious in long-term is because consciousness is not the most efficient way to produce paperclips. Even if first paperclip-optimizer is conscious, it has no reason to keep consciousness once it find better way to produce paperclips without consciousness.

Comment by quetzal_rainbow on StartAtTheEnd's Shortform · 2024-04-08T19:13:15.628Z · LW · GW

You have really weird beliefs about the past.

The general mechanism for dating 200 yers ago was arranged marriage. It not always was forced, you could refuse in really uncomfortable cases, but social pressure was immensive and if you was, like, peasant, you considered your comfortable survival much before your personal feelings. Yep, it probably didn't feel like optimization number-crunching, but this was because all optimization was from outside - people who didn't follow the custom simply died.  

And I don't even talk about nice family optimization task "You are peasant in Russia in 19th century, and it's famine outside, you should choose what child you are going to stop feeding, because it's less condemnable practice than abortion". Or "You are peasant in Russia in 19th century, and it's famine outside, so you need to choose which child to kick out of the house for them to become factory workers (if they are lucky) or beggars or thieves or prostitutes (child prostitution in Russian Empire was not uncommon)".

Same with kings and warlords of the past, I expect that they had more freedom of choice

You enemies could be less tactically skilled, but your mistakes killed you in the same amount. 

Were people forced to go to war (as an obvious optimal strategy), or did nationalism and concepts like honor and bravery motivate people

If you were medieval peasant, you basically didn't have money to have  weapon and armor and you mostly didn't have any choice other than suffer the consequences of war. If you were somewhat richer and lived somewhat later, you could go to be mercenary, because war was rare profitable enterprise before capitalism. And if you lived in era of nation states, you usually was drafted in army and had choice between prison/katorga/execution on spot and going to war because your government told you so.

I picture here too dark image of the past, and I need to say that even in this conditions people could find multiple cracks in social order and widen them if they were lucky and creative, and modern times have much more space for such cracks. 

Yes, I agree that we lost some freedoms - we have closed borders between nation states and inscrutable bureaucracy and electronic surveillance and schools are like prisons (but less so than in times when corporeal punishment was widespread) and our status games are absolutely fucked up and heterodoxy in academia is somewhat strained and there are authoritarian states but this seems to be so much more of skill issue than soul-crushing indifference of the universe in the past. 

Comment by quetzal_rainbow on Viliam's Shortform · 2024-04-08T06:21:03.313Z · LW · GW

Ukrainians don't need to join Western culture, they are Western culture. They watched American action movies in 80s and their kids watched Disney and Warner Brothers in 90s and read Harry Potter in 2000s and was on Tumblr in 10s. And I do not even mention that Imperial Russian/Soviet cultures were bona fide Western cultures, and national Ukrainian culture is no less Western than Poland or Czech culture.

Comment by quetzal_rainbow on nikola's Shortform · 2024-04-07T10:52:40.654Z · LW · GW

Problem with scammers is that they do not report successful penetration of defense.

Comment by quetzal_rainbow on Vanessa Kosoy's Shortform · 2024-04-07T09:08:08.554Z · LW · GW

If someone convinces themself that a full nuclear exchange would prevent the development of superhuman AI

I think the problem here is "convinces themself". If you are capable to trigger nuclear war, you are probably capable to do something else which is not that, if you put your mind in that.

Comment by quetzal_rainbow on Vanessa Kosoy's Shortform · 2024-04-07T07:31:22.296Z · LW · GW

If you are capable to use AI to do harmful and costly thing, like "melt GPUs", you are in hard takeoff world.

Comment by quetzal_rainbow on Vanessa Kosoy's Shortform · 2024-04-06T19:40:37.450Z · LW · GW

I always thought "you should use the least advanced superintelligence necessary". I.e., in not-real-example of "melting all GPUs" your system should be able to design nanotech advanced enough to target all GPUs in open enviroment, which is superintelligent task, while not being able to, say, reason about anthropics and decision theory.

Comment by quetzal_rainbow on Partial value takeover without world takeover · 2024-04-05T21:17:35.167Z · LW · GW

You are conflating "what humans own" with "what you can get by process with side effect of killing humans". Humans are not going to own any significant chunk of Earth in the end, they are just going to live on its surface and die when this surface will evaporate during disassembling into Dyson swarm, and all of this 6*10^24 kg of silicon, hydrogen, oxygen and carbon are quite valuable. What does, exactly, prevent this scenario?

Comment by quetzal_rainbow on Partial value takeover without world takeover · 2024-04-05T15:53:35.355Z · LW · GW

I claim that my scenario is not just possible, it's default outcome (conditional on "there are multiple misaligned AIs which for some reason don't just foom").

Comment by quetzal_rainbow on New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking · 2024-04-05T13:51:47.492Z · LW · GW
  • Will some genetically engineered humans have misaligned goals? The answer here is almost certainly yes. 
    • If by "misaligned" all we mean is that some of them have goals that are not identical to the goals of the rest of humanity, then the answer is obviously yes. Individuals routinely have indexical goals (such as money for themselves, status for themselves, taking care of family) that are not what the rest of humanity wants.
    • If by "misaligned" what we mean is that some of them are "evil" i.e., they want to cause destruction or suffering on purpose, and not merely as a means to an end, then the answer here is presumably also yes, although it's less certain.

This is very strange reasoning. Misaligned goals mean that entity basically doesn't care about our existence or well-being, it doesn't gain anything from us being alive and well relatively to us turning into paperclips. For genetically engineered humans reversal is very likely to be true: they are going to love other humans, be friends with them or take pride from position in human social hierarchy, even if they are selfish by human standards, and it is not clear why they should be selfish.

Comment by quetzal_rainbow on Partial value takeover without world takeover · 2024-04-05T09:22:48.526Z · LW · GW

What prevents AIs from owning and disassembling the entire planet because humans, at some point, are outcompeted and can't offer anything worth the resources of the entire planet?

Comment by quetzal_rainbow on Fabien's Shortform · 2024-04-05T06:34:43.495Z · LW · GW

The reason why xz backdoor was discovered is increased latency, which is textbook side channel. If attacker had more points in security mindset skill tree, it wouldn't happen.

Comment by quetzal_rainbow on The Shutdown Problem: Incomplete Preferences as a Solution · 2024-04-03T11:01:03.921Z · LW · GW

Meta-point: I think it would have been better if you had split the post into two parts: one for "Here is a structure of preferences which we would like to instill in our AI," and the second for "Here is how we are going to do it in a prosaic alignment setting." It would have reduced scary "50 min read" into not-so-scary chunks, and people would have been more engaged with the more narrow topics.

Object-level point: I don't think that training for having stochastic choices amounts for what we need. Thompson sampling is stochastic and it is indeed not vNM-rational, but it doesn't mean that it equals to incomplete preferences.

Comment by quetzal_rainbow on james.lucassen's Shortform · 2024-03-20T08:21:32.864Z · LW · GW

My personal model is "if you have lesion, with some small probability it takes over your mind and you smoke anyway, also you can't distinguish whether your decision is due to lesion"

Comment by quetzal_rainbow on Bogdan Ionut Cirstea's Shortform · 2024-03-19T10:55:24.069Z · LW · GW

Similarly, I find that GPT-3, GPT-3.5, and Claude 2 don’t benefit from filler tokens. However, GPT-4 (which Tamera didn’t study) shows mixed results with strong improvements on some tasks and no improvement on others.

It's interesting question whether Gemini has any improvements.

Comment by quetzal_rainbow on Richard Ngo's Shortform · 2024-03-18T20:24:51.728Z · LW · GW

By "squiggle maximizer" I mean exactly "maximizer of number of physical objects such that function is_squiggle returns True on CIF-file of their structure".

We can have different objects of value. Like, you can value "probability that if object in multiverse is a squiggle, it's high-quality". Here yes, you shouldn't create additional low-quality squiggles. But I don't see anything incoherent here, it's just different utility function?

Comment by quetzal_rainbow on Richard Ngo's Shortform · 2024-03-18T19:49:18.293Z · LW · GW

Clearly a squiggle-maximizer would not be an average squigglean

Why??? Being expected squiggle maximizer literally means that you implement policy that produces maximum average number of squiggles across the multiverse. 

Comment by quetzal_rainbow on What is the best argument that LLMs are shoggoths? · 2024-03-17T18:43:48.137Z · LW · GW

Definition given in post: 

I am looking for a discussion of evidence that the LLMs internal "true" motivation or reasoning system is very different from human, despite the human output, and that in outlying environmental conditions, very different from the training environment, it will behave very differently.

I think my example counts.

Comment by quetzal_rainbow on What is the best argument that LLMs are shoggoths? · 2024-03-17T14:58:56.279Z · LW · GW

I wouldn't say that's exactly best argument but for example

Comment by quetzal_rainbow on CronoDAS's Shortform · 2024-03-17T09:28:19.340Z · LW · GW

I mean spiders.

Comment by quetzal_rainbow on CronoDAS's Shortform · 2024-03-17T05:32:15.714Z · LW · GW

You can mention Portia, which can emulate mammal predators' behavior using brain much smaller.

Comment by quetzal_rainbow on quetzal_rainbow's Shortform · 2024-03-15T17:32:36.949Z · LW · GW

Isn't counterfactual mugging (including logical variant) just a prediction "would you bet your money on this question"? Betting itself requires updatelessness - if you don't pay predictably after losing bet, nobody will propose bet to you.

Comment by quetzal_rainbow on "How could I have thought that faster?" · 2024-03-15T10:40:36.993Z · LW · GW

"Lesson overall" can contain idiosyncratic facts that you can learn iff you run into problem and try to solve it, you can't know them (assuming you are human and not AIXI) in advance. But you can ask yourself "how would someone with better decision-making algorithm solve this problem having the same information as me before I tried to solve this problem" and update your decision-making algorithm accordingly.

Comment by quetzal_rainbow on 'Empiricism!' as Anti-Epistemology · 2024-03-14T07:40:44.841Z · LW · GW

Apparently, the dialogue is happening in inverted world - Ponzi schemes have never happened here and everybody agrees on AI X-risk problem.

Comment by quetzal_rainbow on Tamsin Leake's Shortform · 2024-03-12T08:33:11.368Z · LW · GW

It's called "don't build it". Once you have what to delete, things can get complicated

Comment by quetzal_rainbow on "How could I have thought that faster?" · 2024-03-11T20:32:35.588Z · LW · GW

I think difference between what you are describing and what is meant here is captured in this comment:

There's a phenomenon where a gambler places their money on 32, and then the roulette wheel comes up 23, and they say "I'm such a fool; I should have bet 23".

More useful would be to say "I'm such a fool; I should have noticed that the EV of this gamble is negative." Now at least you aren't asking for magic lottery powers.

Even more useful would be to say "I'm such a fool; I had three chances to notice that this bet was bad: when my partner was trying to explain EV to me; when I snuck out of the house and ignored a sense of guilt; and when I suppressed a qualm right before placing the bet. I should have paid attention in at least one of those cases and internalized the arguments about negative EV, before gambling my money." Now at least you aren't asking for magic cognitive powers.


 

Comment by quetzal_rainbow on "How could I have thought that faster?" · 2024-03-11T17:30:31.873Z · LW · GW

Do you mean "If EY was good enough we would knew this trick many years ago"?

Comment by quetzal_rainbow on Evolution did a surprising good job at aligning humans...to social status · 2024-03-10T20:48:17.430Z · LW · GW

Why do you highlight status among bazilliion other things that generalized too, like romantic love, curiosity, altruism?

Comment by quetzal_rainbow on When and why did 'training' become 'pretraining'? · 2024-03-08T15:00:29.587Z · LW · GW

I think, the reason is that LLMs are not (pre)trained to do any particular practical task, they are "just trained to predict text". "Pre" signifies that LLM is a "raw product", not suitable for consumers.

Comment by quetzal_rainbow on TurnTrout's shortform feed · 2024-03-04T21:43:22.407Z · LW · GW

The problem is not that you can "just meditate and come to good conclusions", the problem is that "technical knowledge about actual machine learning results" doesn't seem like good path either.

Like, we can get from NN trained to do modular addition the fact that it performs Fourier transform, because we clearly know what Fourier transform is, but I don't see any clear path to get from neural network the fact that its output is both useful and safe, because we don't have any practical operationalization of what "useful and safe" is. If we had solution to MIRI problem "which program being run on infinitely large computer produces aligned outcome", we could try to understand how good NN in approximating this program, using aforementioned technical knowledge, and have substantial hope, for example.