Posts
Comments
If future more capable models are indeed actively resisting their alignment training, and this is happening consistently, that seems like an important update to be making?
Could someone explain to me what this resisting behavior during alignment training looked like in practice?
Did the model outright say "I don't want to do this?", did it produce nonsensical results, did it become deceptive, did it just ... not work?
This claim seems very interesting if true, is there any further information on this?
glamorize
glomarize is the word I believe you want to use.
As a native German speaker I believe I can expand upon, and slightly disagree with, your definition.
I suspect that a significant portion of the misunderstanding about slave morality comes from the fact that the german word "Moral" (which is part of the Netzschean-term "Sklavenmoral") has two possible meanings, depending on context: Morality and morale, and it is the latter which I consider to be the more apt translation in this case.
Nietzsche was really speaking about slave morale. It is important to understand that slave morality is not an ethical system or a set of values, rather it is a mindset which facilitates by psychological mechanism the adoption of certain values and moral systems.
To be more concrete, it is a mindset that Nietzsche suspects is common among the downtrodden, raped, unlucky, unworthy, pathethic, and unfit.
Such people, according to Nietzsche, value kindness, "goodness of the heart", humility, patience, softness, and other such things, and tend to be suspicious of power, greatness, risk, boldness, ruthlessness, etc.
To the slave, the warmhearted motherly figure who cares about lost puppies is a perfect example of what a good person is like - in sharp contrast to an entrepeneurial, risk-taking type of person who wants to colonize the universe or create a great empire or whatever.
To the slave, that which causes fear is evil - to the master, inspiring fear (or, rather, awe) is an almost necessary attribute of something great, worthy, good.
So, returning to your definition: Slave morality gives rise to the idea that he who is a good boy and cleans his room deserves a cookie. That, I would agree, is a significant consequence of slave morality, but it is not its definition.
I don't think the primary decision makers at Nvidia do believe AGI is likely to be developed soon. I think they are hyping AI because it makes them money, but not really believing that progress will continue all the way to AGI in the near future.
I agree - and if they are at all rational they have expended significant resources to find out whether this belief is justified or not, and I'd take that seriously. If Nvidia do not believe that AGI is likely to be developed soon, I think they are probably right - and this makes more sense if there in fact aren't any 5-level models around and scaling really has slowed down.
If I were in charge of Nvidia, I'd supply everybody until some design shows up that I believe will scale to AGI, and then I'd make sure to be the one who's got the biggest training cluster. But since that's not what's happening yet, that's evidence that Nvidia do not believe that the current paradigms are sufficiently capable.
But how would this make sense from a financing perspective? If the company reveals that they are in posession of a 5-level model they'd be able to raise money at a much higher valuation. Just imagine what would happen to Alphabet stock if they proved posession of something significantly smarter than GPT4.
Also, the fact that Nvidia is selling its GPUs rather than keeping them all for itself does seem like some kind of evidence against this. If it were really all just a matter of scaling, why not cut everyone off and rush forward? They have more than enough resources by now to pay the foremost experts millions of dollars a year, and they'd have the best equipment too. Seems like a no-brainer if AGI was around the corner.
Similarly, he claims that the bill does not acknowledge trade-offs, but the reasonable care standard is absolutely centered around trade-offs of costs against benefits.
Could somebody elaborate on this?
My understanding is that if a company releases an AI model knowing it can be easily exploited ('jailbroken'), they could be held legally responsible - even if the model's potential economic benefits far outweigh its risks.
For example, if a model could generate trillions in economic value but also enable billions in damages through cyberattacks, would releasing it be illegal despite the net positive impact?
Furthermore, while the concept of 'reasonable care' allows for some risk, doesn't it prohibit companies from making decisions based solely on overall societal cost-benefit analysis? In other words, can a company justify releasing a vulnerable AI model just because its benefits outweigh its risks on a societal level?
It seems to me that this would be prohibited under the bill in question, and that very much seems to me to be a bad thing. Destroying lots of potential economic value, while having a negilgible effect on x-risk seems bad. Why not drop everything that isn't related to x-risk, and increase the demands on reporting, openness, sharing risk-assessments, etc.? Seems far more valuable and easier to comply with.
Yes, we will live in a world where everything will be under (some level of) cyberattack 24/7, every identity will have to be questioned, every picture and video will have to somehow be proven to be real, and the absolute most this bill can do is buy us a little bit more time before that starts happening. Why not get used to it now, and try to also maximize the advantages of having access to competent AI models (as long as they aren't capable of causing x-risks)?
1, Yes, but they also require far more money to do all the good stuff as well! I’m not saying there isn’t a tradeoff involved here.
2, Yes, I’ve read that. I was saying that this is a pretty low bar, since an ordinary person isn’t good at writing viruses. I’m afraid that the bill might have the effect of making competent jailbreakable models essentially illegal, even if they don’t pose an existential risk (in which case that would be necessary ofc.), and even if their net value for society is positive, because there is a lot of software out there that‘s insecure and that a reasonably competent coding AI could exploit and cause >500 MM in damages.
I’m saying that it might be better to tell companies to git gud at computer security and accept the fact that yes, an AI will absolutely try to break their stuff, and that they won’t get to sue Anthropic if something happens.
Correct me if I'm wrong, but it seems to me that something this law implies is that it's only legal to release jailbreakable models if they (more or less) suck.
Got something that can write a pretty good computer virus or materially enable somebody to do it? Illegal under SB1047, and I think the costs might outweigh the benefits here. If your software is so vulnerable that an LLM can hack it, that should be a you problem. Maybe use an LLM to fix it, I don't know. The benefit of AI systems intelligent enough to do that (but too stupid to pose actual existential risks) seems greater than the downside of initial chaos that would certainly ensue from letting one loose on the world.
If I had to suggest an amendment, I'd word it in such a way that as long as the model outputs publicly available information, or information that could be obtained by a human expert, it's fine. There are already humans who can write computer viruses, so your LLMs should be allowed to do it as well. What they should not be allowed to do is design scary novel biological viruses from scratch, make scary self-replicating nanotech, etc., since human experts currently can't do those things either.
Or, in case that is too scary, maybe apply my amendment only to cyber-risks, but not to bio/nuclear/nanotech,....
How is this not basically the widespread idea of recursive self improvement? This idea is simple enough that it has occurred even to me, and there is no way that, e.g. Ilya Sutskever hasn't thought about that.
Don't do this, please. Just wait and see. This community is forgiving about changing ones mind.
Some hopefully constructive criticism:
- I believe it's "agentic", not "agentive".
- "Save scumming" isn't a widely known term. If I hadn't known exactly where this was going, it might have confused me. Consider replacing it with something like "trial and error".
- I would rework the part where the blob bites the finger off, it causes people to ask stuff like "but how should a piece of Software bite my finger?", this derails the conversation. Don't specify exactly how it's going to try to prevent the pushing of the button, explain that it has a strong inventive to do so, that it is correct about that, and that it can use the abilities which it learned to to understand and manipulate the world to accomplish that.
Edit: To end this on a positive note: This format is under explored. We need more "alignment is hard 101" content that is as convincing as possible, without making use of deception. Thank you for creating something that could become very valuable with a bit of iterative improvement. Like, genuinely. Thank you.
While I am not a lawyer, it appears that this concept might indeed hold some merit. A similar strategy is used by organizations focused on civil rights, known as a “warrant canary”. Essentially, it’s a method by which a communications service provider aims to implicitly inform its users that the provider has been served with a government subpoena, despite legal prohibitions on revealing the existence of the subpoena. The idea behind it is that it there are very strong protections against compelled speech, especially against compelled untrue speech (e.g. updating the canary despite having received a subpoena).
The Electronic Frontier Foundation (EFF) seems to believe that warrant canaries are legal.
I believe Zvi was referring to FAAMG vs startups.
I read A Fire Upon the Deep a few years ago, and even back then I found it highly prescient. Now I'll take this sad event as an opportunity to read his highly acclaimed prequel A Deepness in the Sky. RIP.
Murder is just a word. ... SBF bites all the bullets, all the time, as we see throughout. Murder is bad because look at all the investments and productivity that would be lost, and the distress particular people might feel
You are saying this as if you disagreed with it. In this case, I'd like to vehemently disagree with your disagreeing with Sam.
Murder really is bad because of all the bad things that follow from it, not because there is some moral category of "murder", which is always bad. This isn't just "Sam biting all the bullets", this is basic utilitarianism 101, something that I wouldn't even call a bullet. The elegance of this argument and arguments like it is the reason people like utilitarianism, myself included.
Believing this has, in my opinion, morally good consequences. It explains why murdering a random person is bad, but very importantly does not explain why murdering a tyrant is bad, or why abortion is bad. Deontology very easily fails those tests, unless you're including a lot of moral "epicycles".
To me it feels exactly like the kind of habit we should get into.
Imagine an advanced (possibly alien) civilization, with technology far beyond ours. Do you imagine its members being pestered by bloodsucking parasites? Me neither.
The existence of mosquitoes is an indictment of humanity, as far as I'm concerned.
Is there an actually good argument for why eliminating only disease carrying mosquitoes is acceptable, rather than just wiping them all out? There is no question that even without the threat of malaria, creatures like mosquitoes, bed-bugs and horse-flies decrease the quality of life of humans and animals. Would the effects on ecosystems really be so grave that they might plausibly outweigh the enormous benefits of their extinction?
You know the way lots of people get obsessed with Nietzsche for a while? They start wearing black, becoming goth, smoking marijuana, and talking about how like “god is dead, nothing matters, man.” This never happened to me, in part because Nietzsche doesn’t really make arguments, just self-indulgent rambles.
This is objectionable is many ways. To say that one of the most influential German philosophers produced only self-indulgent rambles is a sufficiently outrageous claim that you should be required to provide an argument in its favor.
I don't even disagree entirely. I view Nietzsche as more of a skilled essay-writer than a philosopher, who tried to appeal more to aesthetics than reason alone, but reducing Nietzsche to a sort-of 19th century "influencer"-type is ridiculous.
I don't even know where to begin with the list, but here are the main reasons I suspect people, including myself, did not find Oppenheimer straightforwardly enjoyable.
- I knew what was going to happen in advance because it's historically accurate. That was probably the biggest one for me. Yes, the Bomb is going to work, it's going to be dropped, Oppenheimer will survive, etc.
- It's three hours of mostly people talking inside rooms, mostly about things I already knew about.
- The Scenes depicting his unhappy love-life, especially those including his wife, weren't interesting to me.
- It could have been more about the difficulty of making important moral judgements, but instead focused on political aspects of the project.
If we can’t get it together, perhaps we can at least find out who thinks hooking AIs up to nuclear weapons is an intriguing idea, and respond appropriately.
I unironically find it an intriguing idea, because it seems like it's a potential solution to certain games of nuclear chicken. If I can prove (or at least present a strong argument) that I've hooked up my nuclear weapons to an AI that will absolutely retaliate to certain hostile acts, that seems like a stronger deterrent than just the nukes alone.
After all, the nightmare scenario for nuclear arms strategy is "the enemy launches one nuke", because it makes all actions seem bad. Retaliating might escalate things further, not retaliating lets your enemies get away with something they shouldn't be getting away with, etc. etc.
edit: I am of course aware that there are a myriad of things that could easily go wrong when doing this, so please do not take my comment as any kind of advocacy in favor of doing this.
People talk about "welfare", "happiness" or "satisfaction", but those are intrinsically human concepts
No, they are not. Animals can feel e.g. happiness as well.
If you use the word "sentient" or synonyms, provide at least some explanation of what do you mean by it.
Something is sentient if being that thing is like something. For instance, it is a certain way to be a dog, so a dog is sentient. As a contrast, most people who aren't panpsychists do not believe that it is like anything to be a rock, so most of us wouldn't say of a rock that it is sentient.
Sentient beings have conscious states, each of which are (to a classical utilitarian) desirable to some degree (which might be negative, of course). That is what utilitarians mean by "utility": The desirability of a certain state of consciousness.
I expect that you'll be unhappy with my answer, because "desirability of a certain state of consciousness" does not come with an algorithm for computing that, and that is because we simply do not have an understanding of how consciousness can be explained in terms of computation.
Of course having such an explanation would be desirable, but its absence doesn't render utilitarianism meaningless, because humans still have an understanding of what approximately we mean by terms such as "pleasure", "suffereing", "happiness", even if it is merely in a "I know it when I see it" kind of way.
I'm a bit confused about what exactly you mean, and if I attribute to you a view that you do not hold, please correct me.
I think the assumption that there is one correct population ethics is wrong, and that it's totally fine for each person to have different preferences about the future of the universe just like they have preferences about what ice cream is best
This kind of argument has always puzzled me. Your ethical principles are axioms, you define them to be correct, and this should compel you to believe that everybody else's ethics, insofar as they violate those axioms, are wrong. This is where the "objectivity" comes from. It doesn't matter what other people's ethics are, my ethical principles are objectively the way they are, and that is all the objectivity I need.
Imagine there were a group of people who used a set of axioms for counting (Natural Numbers) that violated the Peano axioms in some straightforward way, such that they came to a different conclusion about how much 5+3 is. What do you think the significance of that should be for your mathematical understanding? My guess is "those people are wrong, I don't care what they believe. I don't want to needlessly offend them, but that doesn't change anything about how I view the world, or how we should construct our technological devices."
Likewise, if a deontologist says "Human challenge trials for covid are wrong, because [deontological reason]", my reaction to that (I'm a utilitarian) is pretty much the same.
I understand that there are different kinds of people with vastly different preferences for what we should try to optimize for (or whether we should try to optimize for anything at all), but why should that stop me from being persuaded by arguments that honor the axioms I believe in, or why should I consider arguments that rely on axioms I reject?
I realize I'll never be able to change a deontologist's mind using utilitarian arguments, and that's fine. When the longtermists use utilitarian arguments to argue in favor of longtermism, they assume that the recipient is already a utilitarian, or at least that he can be persuaded to become one.
Tin (mostly due to glass-production) and Phosphorous (for fertilizers) are two more example of chemical elements that we are running out of rather quickly. Not completely and irreversibly, but enough to cause insane price-spikes.
Sand, including high-purity silica-sand for chip production are also running low, and aren't easy to replace.
Those scenarios seem way too underspecified for me to be able to provide an answer. The answers I would give, particularly for question 3 and 4, totally depend on what exactly the side-effects of the killing and taking of money would be. Would other people be able to find out about this? Might they become concerned that the same fate could befall them, etc. etc.
I don't know where you live, but in Europe you can get your antibody level measured for around 50€.
Saying "if you are worried, you should get your booster" is not at all equivalent to "if you have gotten your booster, you should not be worried about this virus".
The first statement permits that one can still be worried after having gotten one's booster (just a little less), which the second statement does not permit. Therefore, those two statements cannot be logically equivalent.
If you had written "if you have gotten your booster, you should not be as worried about this virus [as before]", that would have been fine.
Thank you, I had no idea markdown needed to be actively activated in the profile.
There's this post, which suggests that Original antigenic sin is unlikely to be a problem. I really hope that's true.