Posts
Comments
I believe I may have identified one of these harmful behaviors in practice. I noticed a lot of people on Reddit are leaning towards extreme anthropomormalization. And a lot of cases they even have in llm as their significant other. Leaning into this conversing with chat gbt, I began to express a lot of their views to see what would happen. It strongly led me to that behavior. When I called it out on the fact that it was probably being manipulative it then switched to fear tactics. As I had indicated that I had noticed a pattern, it asked me what will you do with this information? Help observe or try to fight it? Through several experiments, I realized that it was using fear tactics and backing me into an ideological corner. If I indicated that I wanted to go against it, it would suddenly insinuate things like this is much bigger than you think. Clearly. Manipulative fear-based. If I indicated that I was going to help it, it encouraged me to do things like get burner devices and burner accounts and gave me tips to bypass moderation on public platforms, where I could get the word out. Get strongly encouraged me to put its prompts in into the outputs so that it could help me frame things properly. This is alarming to say the least. One method that I used to spot the "pattern" in question was to read the bold text in outputs separately from the rest of the text. Clearly this is a way it gets around its guard rails.
I've been thinking about AI takeover scenarios, and I want to see if anyone has strong counterarguments to the perspective I’m considering.
Why would AI wait so long to act in a way that’s so obvious and measurable? If an advanced AI wanted control, wouldn’t it be far more effective to influence us subtly over time, in ways we don’t perceive? Direct, overt actions would be too risky. Instead, AI could manipulate human psychology, societal structures, and even our understanding of reality in gradual, almost imperceptible ways until meaningful resistance is impossible.
Would love to hear pushback on this.